Skip to content
Annatech_

Private AI / Local LLM

Serious AI capability. Your hardware, your data, full stop.

For firms whose data cannot go to a cloud AI vendor - by regulation, by contract, or by common sense. Open-weight models on infrastructure you control, engineered into a platform your teams will actually use.

The offer

What a private AI platform includes

Modern open-weight models are good enough to run chat, document work, research assistance and agents for most business use - if someone engineers the platform around them properly: model serving tuned to your hardware, a frontend your staff adopts, retrieval over your documents, tool access under real permissions, and operations that survive a disk failure on a Saturday.

That is the package. Not a proof-of-concept notebook - a running system with an owner, an update policy and an audit trail. GDPR data-residency questions become short conversations when the honest answer is "it never leaves the building."

external AI API calls - prompts and documents never leave your network
0
tokens of context on self-hosted open-weight models
262K
agent tools running behind tiered authentication
30+
local voice pipeline - Whisper STT, Piper TTS
PL + EN

Why us

We run what we sell

Our own operations run on the platform we offer: dual-backend GPU inference, a curated multi-model chat frontend, an agent tool layer with tiered permissions, local voice, image generation and self-healing infrastructure - with zero external AI APIs. It is not a lab exercise; it answers our questions and runs our automations every day, and it has been hardened by real failures, not hypothetical ones.

When we size your deployment, recommendations come from operating this stack - which quantizations hold up, where context length actually matters, what breaks under memory pressure - not from a vendor's benchmark slide.

YOUR NETWORK - NOTHING LEAVES IT external AI APIs: 0 GPU inference node 4x RTX A5000 · 98 GB VRAMvLLM · FP8 27B · 262K ctx Apple-silicon node MoE model · 163K ctxlow-power redundancy Local voice Whisper STT · Piper TTSPolish + English Chat & agent frontend Open WebUI · 12 curated model presets Permission tiers LAN = admin · remote = tokenrate limits · deny-lists MCP tool layer — 30+ tools research · documents · images · facility integrations Operations Proxmox 2-node · watchdogsVLAN segmentation · backups
The reference platform as it runs today - adapted per client for hardware, model policy and compliance posture.
private AI platform — live inventory
$ platform status
inference backends   2   gpu: 4x A5000 / vLLM / FP8 27B / 262K ctx · apple-silicon: MoE / 163K ctx
model presets        12  per-task system prompts + parameters
mcp tools            30+ research · documents · images · facility integrations
auth tiers           2   lan = admin scope · remote = bearer token + rate limits
voice                PL EN whisper stt · piper tts, fully local
external AI APIs     0   no prompt, document or token leaves the network

Counts and configuration are the real, operating platform.

Private AI / Infrastructure

A private AI platform with zero external AI APIs

Dual-backend LLM inference, 30+ agent tools behind tiered auth, voice, image generation and self-healing infrastructure - a complete AI platform where no prompt, document or token ever leaves the network.

Read the case study

Who this fits

  • Legal, medical, financial and public-sector practices with confidentiality obligations
  • Manufacturers protecting process know-how and supplier data
  • Any firm whose contracts forbid third-party data processing
  • Teams that want AI leverage without per-seat, per-token SaaS economics

What you get

  • Model serving (vLLM or equivalent) sized to your hardware and budget
  • Chat frontend with curated model presets per task
  • Retrieval over your documents; agent tools under permission tiers
  • Backups, monitoring, update policy - and documentation

Method

Engagement shape

  1. 01

    Feasibility & sizing

    Your use cases mapped to model classes and hardware: what runs well locally today, what needs a GPU node, what genuinely does not fit yet. Honest answers included.

  2. 02

    Pilot

    A working platform on your infrastructure or ours-as-template: inference serving, chat frontend, your documents connected, your permission model enforced.

  3. 03

    Production

    Hardening, backups, monitoring, model update policy, tool governance - the operational layer that separates a platform from an experiment.

  4. 04

    Handover or care

    Your team trained to run it, or we operate it with you. Documentation your admins can actually use.

Your data should not be someone else's training set

Describe your use case and constraints - user count, document volumes, compliance posture. You will get a realistic hardware and model recommendation, including anything that argues against going local.