RouxYou

A Self-Evolving, Sovereign AI Agent System on Consumer Hardware

RouxYou is a fully local, self-modifying, multi-agent AI system that runs entirely on consumer-grade hardware with zero cloud dependencies. It combines local LLM inference, autonomous multi-agent orchestration, safe self-modification through a blue-green deployment pipeline, episodic memory that teaches the system from its own history, and a three-tiered proposal system that lets the system suggest its own improvements — all running on a single desktop with a mid-range NVIDIA GPU (16GB VRAM) and 32GB DDR4 RAM. No data leaves the network. No API keys are required. The system answers to its operator and no one else.

The Problem

Cloud-dependent frameworks (CrewAI, LangChain, AutoGen, OpenClaw) provide orchestration and tool use but require sending every prompt, every file reference, and every personal context item to a third-party API. They are static — configured once, executed repeatedly, never improved by their own experience. The operator pays per token, per month, indefinitely. Their data is someone else's training signal.

Academic self-evolving systems (GEA, Agent0, AgentEvolver, Darwin Gödel Machine) prove that agents can improve themselves through evolutionary methods, reinforcement learning, and population-based optimization. But these exist as benchmark results — not as systems anyone can run, let alone run locally on consumer hardware.

Local inference platforms (Ollama, LM Studio, LocalAI) solve the model-hosting problem but provide no orchestration, no agent coordination, no memory, and no autonomy. They are engines without vehicles.

No existing system combines all four requirements: fully local inference on consumer hardware, multi-agent orchestration with intelligent routing, self-modification with safety constraints and rollback, and persistent memory that compounds over time.

RouxYou does.

Architecture Overview

RouxYou is a multi-agent system with several independent services running on a local network. Each service has health endpoints and a clean API. They communicate over HTTP and don't care whether they're on one machine or twenty.

                  ┌─────────────┐
                  │   Gateway   │  :8000
                  │ route table │
                  └──────┬──────┘
                         │
            ┌────────────┼────────────┐
            │            │            │
     ┌──────┴───────┐    │    ┌───────┴──────┐
     │ Orchestrator │    │    │  Watchtower   │  :8010
     │    :8001     │    │    │  supervisor   │
     │  3B router   │    │    └───────┬───────┘
     └──────┬───────┘    │            │
            │            │    ┌───────┴──────┐
       ┌────┴────┐       │   │  Cron/Coach   │  :8012
       │         │       │   │  proposals    │
  ┌────┴───┐ ┌───┴───┐   │   └──────────────┘
  │ Coder  │ │Worker │   │
  │ :8002  │ │ :8003 │   │
  │ 14B    │ │ 20+   │   │
  │ reason │ │ caps  │   │
  └────────┘ └───────┘   │
                         │
    ┌────────┐  ┌────────┴──┐  ┌──────────┐
    │ Memory │  │ RAG API   │  │   Roux   │
    │ :8004  │  │  :8011    │  │  :8014   │
    │ FAISS  │  │  bridge   │  │ voice IO │
    └────────┘  └───────────┘  └──────────┘

The Gateway

A reverse proxy and route table that serves as the single entry point for all requests. The Gateway maintains a mapping of active services, enabling hot-swapping between production and staging instances during self-modification deployments. Clients never see a blip during upgrades.

The Orchestrator

The system's intent classifier, task queue, and router, powered by a lightweight local LLM (~3B parameters, ~6GB VRAM). The Orchestrator implements a three-tier routing system that minimizes LLM overhead:

The Orchestrator also manages a persistent task queue with priority levels (urgent, normal, background), pause/resume controls, abort capabilities, and a full execution history. Approved proposals from the Watchtower layer enter this same queue, creating a unified pipeline for both human-initiated and system-initiated work.

The Coder

The deep reasoning engine, running a ~14B parameter model (4-bit quantization). The Coder is invoked only for Tier 2 tasks that require multi-step planning, code generation, or complex decision-making. It produces structured execution plans that the Worker carries out. The Coder loads into VRAM on demand and can be swapped for different models depending on the task domain.

The Worker

A unified execution engine with 20+ built-in capabilities and no LLM overhead. The Worker is pure Python — deterministic, fast, and predictable. Capabilities include filesystem operations (read, write, edit, copy, move, delete, directory trees), command execution, web search via a self-hosted search engine, multi-file operations, content search with regex support, screen capture and vision analysis, code verification, and deploy pipeline integration. The Worker is the primary target of self-modification — when RouxYou improves itself, it most often extends the Worker's capabilities.

The Watchtower Layer

Two services that together form the system's safety boundary and autonomous improvement engine:

The Supervisor (:8010) is an immutable component that cannot be modified by the system it oversees. It manages the blue-green deployment pipeline, holds the human-in-the-loop approval gate for all self-modifications, handles service restarts, and receives escrowed task results when agents restart during deployment.

The Cron Service (:8012) runs scheduled automation including the Proposal System (detailed below), episodic memory decay with file locking for race condition safety, schedule syncs, and the web research pipeline.

The Supervisor's immutability is a deliberate architectural constraint. A system that can modify itself must have a component it cannot reach. This is RouxYou's answer to the alignment problem at the system level.

The Proposal System

This is RouxYou's second major architectural innovation, introduced in Phase 25. The system doesn't just execute tasks — it identifies its own areas for improvement and proposes changes to its operator.

Three tiers of observation feed into a unified Proposal Bus:

Tier 1 — Heuristic Observers

Six pure-Python observers run every 30 minutes with zero LLM overhead. They check service health (are all agents responding?), memory pressure (is the episodic store bloated?), codebase drift (are there stale files or unused imports?), task queue health (are tasks stuck or failing repeatedly?), resource usage (disk space, log file sizes), and skill utilization (are capabilities going unused?).

Tier 2 — LLM Coach Enrichment

Raw findings from Tier 1 are passed to a local ~14B LLM for analysis. The Coach enriches proposals with root cause analysis, confidence scores (0.0–1.0), suggested remediation strategies, and cross-references with episodic memory to detect recurring patterns. Priority is adjusted dynamically — an issue that recurs three times gets escalated automatically.

Tier 3 — Web Research

A daily scheduled task searches the web via a self-hosted search engine for improvement patterns relevant to the system's technology stack. Seven rotating topic areas are covered over a weekly cycle: agent orchestration, memory retrieval, local LLM inference, deployment reliability, code intelligence, scheduling automation, and emerging patterns. Search results are evaluated by the local LLM for relevance and actionability. Only concrete, implementable findings with relevance scores above 0.6 are published as proposals.

The Proposal Bus

All three tiers publish to a shared data structure with persistent state. Each proposal carries a title, description, category, priority (1–10), source attribution, confidence score, evidence, suggested executor, and reversibility flag. Proposals track their full lifecycle: pending, approved, executing, completed, failed, or dismissed. Dismissed proposals enter a cooldown period to prevent re-proposal.

Human Approval Gate

Every proposal requires human approval through the dashboard before execution. The operator sees the evidence, the Coach's reasoning, the confidence score, and the proposed action. Approved proposals enter the Orchestrator's task queue as first-class tasks — the same queue that handles human-initiated work. This means every proposal's outcome is recorded in episodic memory, creating a feedback loop: the system learns from the results of its own suggestions.

The Proposal System is not a separate agent. It's a capability layer that any part of the system can contribute to (heuristics, LLM analysis, web research) and the Orchestrator consumes. The system's "Coach" is emergent — it arises from the interaction of observation, enrichment, and memory, not from a single monolithic component.

Self-Modification Pipeline

RouxYou's primary architectural innovation and its clearest differentiator from every other agent framework in the current landscape.

Blue-Green Deployment

When the system generates a modification to its own code, the change does not go directly into production. Instead:

  1. Stage: The Deployer copies the target file, applies the patch, and creates a staging instance in an isolated directory
  2. Boot: The modified code is launched on a staging port. Health endpoint confirms the service is alive and functional
  3. Approval Gate: The Watchtower presents the proposed change to the human operator for review. This gate is immutable — the system cannot bypass it
  4. Swap: The Gateway's route table is updated to point to the staging instance. Production is archived for rollback
  5. Watchdog: A 60-second monitoring period begins. Health checks run every 5 seconds. Three consecutive failures trigger automatic rollback
  6. Normalize: After the watchdog passes, ports are normalized — the promoted code moves to the standard production port and the staging process is retired

Anchor Validation

A key discovery during development: LLM-generated patches frequently reference code that doesn't actually exist in the target file — the model hallucinates anchors from memory or imagination rather than reading the actual source. RouxYou addresses this with file caching before any deploy action, anchor verification against cached content, and fuzzy correction including quote normalization, whitespace tolerance, decorator/function block rebuilding, and line-prefix matching.

System File Protection

System source files are blocked from direct write operations. All modifications must route through the deploy pipeline, which enforces the full blue-green sequence. This prevents the AI from accidentally or intentionally clobbering production code without safety checks.

Memory Architecture

Episodic Memory

RouxYou maintains a store of episodic memories — records of past task executions including what was attempted, what plan was generated, what succeeded, what failed, and what was learned. These episodes are injected into the Coder's context when planning similar tasks, enabling the system to learn from its own history without any model fine-tuning or retraining. Completed proposals are also recorded as episodes, creating a feedback loop between the Proposal System and future planning.

Memory Decay

Stale episodic memories can actively harm performance — teaching the system outdated patterns, incorrect endpoints, or deprecated approaches. RouxYou implements automated memory decay on a scheduled cadence. The decay system uses file locking to prevent race conditions when multiple processes access the memory store simultaneously. Episodes are scored by age, relevance, and outcome, with low-scoring entries pruned automatically.

RAG Integration

A vector-based retrieval system ingests project documentation and file contents to provide contextual grounding for agent decisions. This complements episodic memory by providing broader project context beyond individual task records.

What Makes RouxYou Different

Capability Cloud Frameworks Academic Systems Local Platforms RouxYou
Local inference Partial No (cloud compute) Yes Yes — primary constraint
Multi-agent orchestration Yes N/A No Tiered routing w/ LLM bypass
Self-modification No Yes (benchmark only) No Blue-green w/ rollback
Safety constraints N/A Theoretical N/A Immutable Watchtower
Self-improvement proposals No No No 3-tier observe → propose → execute
Episodic learning File-based notes Population-based No Task-level w/ automated decay
Zero cloud dependency No No Partial Complete stack
Consumer GPU N/A No Yes 16GB VRAM, 32GB RAM
Production deployed Yes No Yes Daily use since Jan 2026

Development Timeline

This timeline establishes prior art for the architectural concepts described in this document.

December 2025
Initial RAG memory system development. Local LLM infrastructure setup. Conversation history ingestion for persistent context.
January 8, 2026
Self-Modifying Agent System v1 — Orchestrator, Coder, Worker architecture established. First multi-agent coordination on local hardware.
January 18, 2026
Super Worker with 15+ capabilities. Tiered routing implemented. Self-hosted web search integration.
February 6, 2026
Dashboard with Brain Activity Panel. Companion conversational interface. Real-time system monitoring.
February 13, 2026
Task Result Escrow system. JSON normalization pipeline. Coder model upgrade to 14B parameters.
February 17, 2026
Blue-Green Deploy Pipeline. Post-Deploy Watchdog with auto-rollback. Anchor auto-correction and port normalization.
February 19, 2026
Scheduling service added. Conflict detection and calendar integration.
February 20, 2026
rouxyou.com registered. Architecture documented for public disclosure and prior art. System renamed from internal codename to RouxYou.
February 22, 2026
Proposal System (Phase 25.1) completed. Five sub-phases: Proposal Bus with persistent state, Orchestrator integration, episodic memory integration with recurrence tracking and priority escalation, LLM Coach enrichment layer, and web research proposal generation via self-hosted search engine.

System Specifications

Hardware: Mid-range consumer desktop — Intel CPU, NVIDIA RTX 5060 Ti (16GB VRAM), 32GB DDR4 RAM. No datacenter hardware. No cloud compute.

Router LLM: ~3B parameter model (~6GB VRAM) for intent classification, routing, and conversational responses.

Reasoning LLM: ~14B parameter model (4-bit quantization) for deep reasoning, code generation, multi-step planning, proposal enrichment, and web research evaluation.

Language: Python. Dashboard built with Streamlit.

Infrastructure: All supporting services (search engine, home automation, monitoring) self-hosted on local network hardware. All services run on the local network.

Enterprise Applications

RouxYou was built for a single operator on a desktop. But the architecture was designed as isolated services communicating over a network — and that pattern scales.

The same system that runs on a consumer desktop can run on an internal cluster, an air-gapped network, or a private data center. The Gateway, Orchestrator, Coder, Worker, and Watchtower are independent services with health endpoints and clean APIs. They don't care whether they're on one machine or twenty.

The Enterprise Problem

Organizations across healthcare, finance, defense, and legal are blocked from adopting agentic AI by a single constraint: data sovereignty. Every major agent framework requires sending proprietary data — patient records, financial models, classified documents, privileged communications — through a third-party cloud API. Legal and compliance teams are right to reject this.

The result is a growing gap: enterprises that need agentic automation the most are the ones least able to adopt existing solutions.

What RouxYou Offers

A sovereign agentic pipeline where no data leaves the organization's network. No API keys to external providers. No per-token metering. No vendor lock-in. The system improves itself over time through episodic memory and self-generated proposals — without sending training data to a third party. And every self-modification passes through an immutable human approval gate with full audit trail.

The architecture naturally extends to enterprise requirements:

The Cost Argument

A single RouxYou node runs on hardware that costs less than two months of enterprise API spend. A 10-node internal cluster providing department-level agentic AI costs less than a single year of per-token cloud billing — and the hardware doesn't expire. There are no seats, no tiers, no usage caps. The system runs until you turn it off.

RouxYou was proven on a single desktop because that's the hardest constraint to engineer around. If it runs on 16GB of VRAM and 32GB of RAM, it runs anywhere. Enterprise deployment isn't a redesign — it's removing constraints.

Philosophy

RouxYou exists because the operator believed that an AI system that manages your life should not report to someone else's server. That the orchestration layer — the part that decides what to do, when to do it, and how to improve — is too important to rent. That a system powerful enough to modify itself must have safety constraints it cannot override. And that consumer hardware, carefully engineered around, is sufficient to run what billion-dollar companies are selling as cloud services.

The name combines roux — the foundational base in Cajun and Creole cooking from which all flavor builds — with you, reflecting the system's core principle: it runs on your hardware, learns from your patterns, and answers to you alone.

A roux is simple — flour and fat, heated slowly, stirred constantly. But from that simple base, everything else is built. RouxYou is the base. What gets built on top of it is up to you.