A Self-Evolving, Sovereign AI Agent System on Consumer Hardware
RouxYou is a fully local, self-modifying, multi-agent AI system that runs entirely on consumer-grade hardware with zero cloud dependencies. It combines local LLM inference, autonomous multi-agent orchestration, safe self-modification through a blue-green deployment pipeline, episodic memory that teaches the system from its own history, and a three-tiered proposal system that lets the system suggest its own improvements — all running on a single desktop with a mid-range NVIDIA GPU (16GB VRAM) and 32GB DDR4 RAM. No data leaves the network. No API keys are required. The system answers to its operator and no one else.
The Problem
Cloud-dependent frameworks (CrewAI, LangChain, AutoGen, OpenClaw) provide orchestration and tool use but require sending every prompt, every file reference, and every personal context item to a third-party API. They are static — configured once, executed repeatedly, never improved by their own experience. The operator pays per token, per month, indefinitely. Their data is someone else's training signal.
Academic self-evolving systems (GEA, Agent0, AgentEvolver, Darwin Gödel Machine) prove that agents can improve themselves through evolutionary methods, reinforcement learning, and population-based optimization. But these exist as benchmark results — not as systems anyone can run, let alone run locally on consumer hardware.
Local inference platforms (Ollama, LM Studio, LocalAI) solve the model-hosting problem but provide no orchestration, no agent coordination, no memory, and no autonomy. They are engines without vehicles.
No existing system combines all four requirements: fully local inference on consumer hardware, multi-agent orchestration with intelligent routing, self-modification with safety constraints and rollback, and persistent memory that compounds over time.
RouxYou does.
Architecture Overview
RouxYou is a multi-agent system with several independent services running on a local network. Each service has health endpoints and a clean API. They communicate over HTTP and don't care whether they're on one machine or twenty.
┌─────────────┐
│ Gateway │ :8000
│ route table │
└──────┬──────┘
│
┌────────────┼────────────┐
│ │ │
┌──────┴───────┐ │ ┌───────┴──────┐
│ Orchestrator │ │ │ Watchtower │ :8010
│ :8001 │ │ │ supervisor │
│ 3B router │ │ └───────┬───────┘
└──────┬───────┘ │ │
│ │ ┌───────┴──────┐
┌────┴────┐ │ │ Cron/Coach │ :8012
│ │ │ │ proposals │
┌────┴───┐ ┌───┴───┐ │ └──────────────┘
│ Coder │ │Worker │ │
│ :8002 │ │ :8003 │ │
│ 14B │ │ 20+ │ │
│ reason │ │ caps │ │
└────────┘ └───────┘ │
│
┌────────┐ ┌────────┴──┐ ┌──────────┐
│ Memory │ │ RAG API │ │ Roux │
│ :8004 │ │ :8011 │ │ :8014 │
│ FAISS │ │ bridge │ │ voice IO │
└────────┘ └───────────┘ └──────────┘
The Gateway
A reverse proxy and route table that serves as the single entry point for all requests. The Gateway maintains a mapping of active services, enabling hot-swapping between production and staging instances during self-modification deployments. Clients never see a blip during upgrades.
The Orchestrator
The system's intent classifier, task queue, and router, powered by a lightweight local LLM (~3B parameters, ~6GB VRAM). The Orchestrator implements a three-tier routing system that minimizes LLM overhead:
- Tier 0: Pattern-matched greetings and status checks — instant response, zero LLM calls
- Tier 1: Simple, well-defined tasks (file operations, web searches, system queries) — dispatched directly to the Worker via pattern matching, bypassing the Coder entirely
- Tier 2: Complex, multi-step tasks requiring planning — forwarded to the Coder for decomposition before execution
The Orchestrator also manages a persistent task queue with priority levels (urgent, normal, background), pause/resume controls, abort capabilities, and a full execution history. Approved proposals from the Watchtower layer enter this same queue, creating a unified pipeline for both human-initiated and system-initiated work.
The Coder
The deep reasoning engine, running a ~14B parameter model (4-bit quantization). The Coder is invoked only for Tier 2 tasks that require multi-step planning, code generation, or complex decision-making. It produces structured execution plans that the Worker carries out. The Coder loads into VRAM on demand and can be swapped for different models depending on the task domain.
The Worker
A unified execution engine with 20+ built-in capabilities and no LLM overhead. The Worker is pure Python — deterministic, fast, and predictable. Capabilities include filesystem operations (read, write, edit, copy, move, delete, directory trees), command execution, web search via a self-hosted search engine, multi-file operations, content search with regex support, screen capture and vision analysis, code verification, and deploy pipeline integration. The Worker is the primary target of self-modification — when RouxYou improves itself, it most often extends the Worker's capabilities.
The Watchtower Layer
Two services that together form the system's safety boundary and autonomous improvement engine:
The Supervisor (:8010) is an immutable component that cannot be modified by the system it oversees. It manages the blue-green deployment pipeline, holds the human-in-the-loop approval gate for all self-modifications, handles service restarts, and receives escrowed task results when agents restart during deployment.
The Cron Service (:8012) runs scheduled automation including the Proposal System (detailed below), episodic memory decay with file locking for race condition safety, schedule syncs, and the web research pipeline.
The Supervisor's immutability is a deliberate architectural constraint. A system that can modify itself must have a component it cannot reach. This is RouxYou's answer to the alignment problem at the system level.
The Proposal System
This is RouxYou's second major architectural innovation, introduced in Phase 25. The system doesn't just execute tasks — it identifies its own areas for improvement and proposes changes to its operator.
Three tiers of observation feed into a unified Proposal Bus:
Tier 1 — Heuristic Observers
Six pure-Python observers run every 30 minutes with zero LLM overhead. They check service health (are all agents responding?), memory pressure (is the episodic store bloated?), codebase drift (are there stale files or unused imports?), task queue health (are tasks stuck or failing repeatedly?), resource usage (disk space, log file sizes), and skill utilization (are capabilities going unused?).
Tier 2 — LLM Coach Enrichment
Raw findings from Tier 1 are passed to a local ~14B LLM for analysis. The Coach enriches proposals with root cause analysis, confidence scores (0.0–1.0), suggested remediation strategies, and cross-references with episodic memory to detect recurring patterns. Priority is adjusted dynamically — an issue that recurs three times gets escalated automatically.
Tier 3 — Web Research
A daily scheduled task searches the web via a self-hosted search engine for improvement patterns relevant to the system's technology stack. Seven rotating topic areas are covered over a weekly cycle: agent orchestration, memory retrieval, local LLM inference, deployment reliability, code intelligence, scheduling automation, and emerging patterns. Search results are evaluated by the local LLM for relevance and actionability. Only concrete, implementable findings with relevance scores above 0.6 are published as proposals.
The Proposal Bus
All three tiers publish to a shared data structure with persistent state. Each proposal carries a title, description, category, priority (1–10), source attribution, confidence score, evidence, suggested executor, and reversibility flag. Proposals track their full lifecycle: pending, approved, executing, completed, failed, or dismissed. Dismissed proposals enter a cooldown period to prevent re-proposal.
Human Approval Gate
Every proposal requires human approval through the dashboard before execution. The operator sees the evidence, the Coach's reasoning, the confidence score, and the proposed action. Approved proposals enter the Orchestrator's task queue as first-class tasks — the same queue that handles human-initiated work. This means every proposal's outcome is recorded in episodic memory, creating a feedback loop: the system learns from the results of its own suggestions.
The Proposal System is not a separate agent. It's a capability layer that any part of the system can contribute to (heuristics, LLM analysis, web research) and the Orchestrator consumes. The system's "Coach" is emergent — it arises from the interaction of observation, enrichment, and memory, not from a single monolithic component.
Self-Modification Pipeline
RouxYou's primary architectural innovation and its clearest differentiator from every other agent framework in the current landscape.
Blue-Green Deployment
When the system generates a modification to its own code, the change does not go directly into production. Instead:
- Stage: The Deployer copies the target file, applies the patch, and creates a staging instance in an isolated directory
- Boot: The modified code is launched on a staging port. Health endpoint confirms the service is alive and functional
- Approval Gate: The Watchtower presents the proposed change to the human operator for review. This gate is immutable — the system cannot bypass it
- Swap: The Gateway's route table is updated to point to the staging instance. Production is archived for rollback
- Watchdog: A 60-second monitoring period begins. Health checks run every 5 seconds. Three consecutive failures trigger automatic rollback
- Normalize: After the watchdog passes, ports are normalized — the promoted code moves to the standard production port and the staging process is retired
Anchor Validation
A key discovery during development: LLM-generated patches frequently reference code that doesn't actually exist in the target file — the model hallucinates anchors from memory or imagination rather than reading the actual source. RouxYou addresses this with file caching before any deploy action, anchor verification against cached content, and fuzzy correction including quote normalization, whitespace tolerance, decorator/function block rebuilding, and line-prefix matching.
System File Protection
System source files are blocked from direct write operations. All modifications must route through the deploy pipeline, which enforces the full blue-green sequence. This prevents the AI from accidentally or intentionally clobbering production code without safety checks.
Memory Architecture
Episodic Memory
RouxYou maintains a store of episodic memories — records of past task executions including what was attempted, what plan was generated, what succeeded, what failed, and what was learned. These episodes are injected into the Coder's context when planning similar tasks, enabling the system to learn from its own history without any model fine-tuning or retraining. Completed proposals are also recorded as episodes, creating a feedback loop between the Proposal System and future planning.
Memory Decay
Stale episodic memories can actively harm performance — teaching the system outdated patterns, incorrect endpoints, or deprecated approaches. RouxYou implements automated memory decay on a scheduled cadence. The decay system uses file locking to prevent race conditions when multiple processes access the memory store simultaneously. Episodes are scored by age, relevance, and outcome, with low-scoring entries pruned automatically.
RAG Integration
A vector-based retrieval system ingests project documentation and file contents to provide contextual grounding for agent decisions. This complements episodic memory by providing broader project context beyond individual task records.
What Makes RouxYou Different
| Capability | Cloud Frameworks | Academic Systems | Local Platforms | RouxYou |
|---|---|---|---|---|
| Local inference | Partial | No (cloud compute) | Yes | Yes — primary constraint |
| Multi-agent orchestration | Yes | N/A | No | Tiered routing w/ LLM bypass |
| Self-modification | No | Yes (benchmark only) | No | Blue-green w/ rollback |
| Safety constraints | N/A | Theoretical | N/A | Immutable Watchtower |
| Self-improvement proposals | No | No | No | 3-tier observe → propose → execute |
| Episodic learning | File-based notes | Population-based | No | Task-level w/ automated decay |
| Zero cloud dependency | No | No | Partial | Complete stack |
| Consumer GPU | N/A | No | Yes | 16GB VRAM, 32GB RAM |
| Production deployed | Yes | No | Yes | Daily use since Jan 2026 |
Development Timeline
This timeline establishes prior art for the architectural concepts described in this document.
System Specifications
Hardware: Mid-range consumer desktop — Intel CPU, NVIDIA RTX 5060 Ti (16GB VRAM), 32GB DDR4 RAM. No datacenter hardware. No cloud compute.
Router LLM: ~3B parameter model (~6GB VRAM) for intent classification, routing, and conversational responses.
Reasoning LLM: ~14B parameter model (4-bit quantization) for deep reasoning, code generation, multi-step planning, proposal enrichment, and web research evaluation.
Language: Python. Dashboard built with Streamlit.
Infrastructure: All supporting services (search engine, home automation, monitoring) self-hosted on local network hardware. All services run on the local network.
Enterprise Applications
RouxYou was built for a single operator on a desktop. But the architecture was designed as isolated services communicating over a network — and that pattern scales.
The same system that runs on a consumer desktop can run on an internal cluster, an air-gapped network, or a private data center. The Gateway, Orchestrator, Coder, Worker, and Watchtower are independent services with health endpoints and clean APIs. They don't care whether they're on one machine or twenty.
The Enterprise Problem
Organizations across healthcare, finance, defense, and legal are blocked from adopting agentic AI by a single constraint: data sovereignty. Every major agent framework requires sending proprietary data — patient records, financial models, classified documents, privileged communications — through a third-party cloud API. Legal and compliance teams are right to reject this.
The result is a growing gap: enterprises that need agentic automation the most are the ones least able to adopt existing solutions.
What RouxYou Offers
A sovereign agentic pipeline where no data leaves the organization's network. No API keys to external providers. No per-token metering. No vendor lock-in. The system improves itself over time through episodic memory and self-generated proposals — without sending training data to a third party. And every self-modification passes through an immutable human approval gate with full audit trail.
The architecture naturally extends to enterprise requirements:
- Multi-tenant isolation — service-per-team or service-per-department, each with independent Orchestrator and Worker instances behind a shared Gateway
- Role-based approval gates — the Watchtower's human-in-the-loop pattern extends to multi-level approval chains with different authorization tiers
- Compliance audit trail — every task, every plan, every self-modification, every proposal is logged with full provenance. The episodic memory system is already a complete record of system behavior
- Model flexibility — the Coder can swap models per task domain. A legal department runs a model trained on case law. A finance team runs one tuned for quantitative analysis. Same infrastructure, different capabilities
- Air-gap capable — zero external dependencies means the entire stack runs on isolated networks with no internet connectivity required
- Autonomous improvement — the Proposal System means the deployment doesn't stagnate after installation. The system identifies its own optimization opportunities, proposes changes, and learns from outcomes — all within the organization's network boundary
The Cost Argument
A single RouxYou node runs on hardware that costs less than two months of enterprise API spend. A 10-node internal cluster providing department-level agentic AI costs less than a single year of per-token cloud billing — and the hardware doesn't expire. There are no seats, no tiers, no usage caps. The system runs until you turn it off.
RouxYou was proven on a single desktop because that's the hardest constraint to engineer around. If it runs on 16GB of VRAM and 32GB of RAM, it runs anywhere. Enterprise deployment isn't a redesign — it's removing constraints.
Philosophy
RouxYou exists because the operator believed that an AI system that manages your life should not report to someone else's server. That the orchestration layer — the part that decides what to do, when to do it, and how to improve — is too important to rent. That a system powerful enough to modify itself must have safety constraints it cannot override. And that consumer hardware, carefully engineered around, is sufficient to run what billion-dollar companies are selling as cloud services.
The name combines roux — the foundational base in Cajun and Creole cooking from which all flavor builds — with you, reflecting the system's core principle: it runs on your hardware, learns from your patterns, and answers to you alone.
A roux is simple — flour and fat, heated slowly, stirred constantly. But from that simple base, everything else is built. RouxYou is the base. What gets built on top of it is up to you.