AI & Autonomous Systems

Osric Labs

An autonomous multi-agent Company Operating System that runs entirely on local hardware. 21 microservices orchestrating AI workers across a distributed physical cluster -- zero cloud dependency, zero per-token costs, full operational sovereignty.

4
Physical Nodes
21
Microservices
Zero
Cloud Dependency
Dashboard
Nodes
Agents
Inference
Skills
Metrics
Security
Cluster Overview
All Systems Nominal
Control Plane
Raspberry Pi -- Gateway Hub
4 services active
Big Brain
RX 7900 XTX -- Qwen 32B
6 services active
Worker Factory
6x RTX 3060 -- vLLM Pool
12 workers running
Agent Activity
orchestrator routed task:code-review to worker-3
2s ago
frontier-router escalated query to cloud fallback
14s ago
skill-registry registered new skill: summarize-pr
28s ago
container-manager spawned agent-worker-7
45s ago
boot-controller completed health check on all nodes
1m ago
Inference Metrics
Qwen 32B (Big Brain)
18.4 tok/s
Qwen 14B (vLLM Pool)
42.1 tok/s
Qwen 7B (vLLM Pool)
67.3 tok/s
Active Requests
14
Queue Depth
3
Cloud Fallbacks (24h)
2

System Views

The operational interface for a fully autonomous, self-managing AI infrastructure.

The Challenge

Running AI-driven operations at scale means constant interaction with cloud LLM APIs. For an autonomous system making hundreds of inference calls per hour, this creates three compounding problems:

  • Escalating costs -- per-token pricing on cloud APIs turns every agent action into a line item, making autonomous operation economically unsustainable at scale
  • Latency constraints -- round-trip times to cloud endpoints introduce unpredictable delays in agent decision loops, degrading real-time orchestration
  • Vendor lock-in -- dependency on external API providers means rate limits, policy changes, and outages can halt the entire system without warning

The system needed to run autonomously 24/7, self-heal when nodes fail, upgrade its own capabilities, and do all of this without a human in the loop for routine operations.

Our Approach

We designed and built a fully local-first, multi-node physical architecture that keeps all inference, orchestration, and data on-premise.

  • Local-first inference -- llama.cpp on the Big Brain node runs Qwen 2.5 32B for complex reasoning; vLLM across the Worker Factory handles high-throughput 7B/14B tasks in parallel
  • Physical distribution -- a Raspberry Pi Control Plane manages the WebSocket gateway, orchestration, Telegram interface, and boot sequencing across all nodes
  • Self-improving agents -- the Skill Registry tracks agent capabilities and training outcomes; the Planner Trainer refines orchestration strategies based on task performance data
  • Docker-isolated workers -- each agent runs in its own container with scoped credentials from the Credential Vault, enabling safe parallel execution
  • Frontier Router -- when a task exceeds local model capability, the router selectively escalates to cloud APIs, keeping cloud usage under 2% of total inference
  • Self-upgrade system -- the Upgrade Manager can pull, test, and deploy new service versions across the cluster without human intervention

Tech Stack

The tools and technologies powering the Osric Labs autonomous operating system.

TypeScript PostgreSQL Redis Docker systemd llama.cpp vLLM Qwen 2.5 WebSocket Telegram API

Let's build something worth launching.

Tell us what you're working on. We'll respond within 24 hours with honest feedback on whether we're the right fit.

Book a Free Consult

No pitch decks. No pressure. Just a real conversation.