The Agentic Mesh — what CHP can do with the fleet as it stands
Stitch hosts into one correlated trace across an edge mesh.
Synced from
chp-core/docs/agentic-mesh.md.
Status: design / approach doc. Describes the agentic layer we can build on the live four-node CHP mesh, the Hugging Face resource surface we can draw on, and the flagship "CHP develops CHP" loop. Companion to
docs/synology-onboarding.md(node bring-up) and the mesh control-plane work (mesh stats/ capacity routing).
1. The substrate: four nodes, one capability namespace
The mesh is a single capability namespace (chp.adapters.*) routed through one
gateway, with capacity-aware routing, federated evidence, and policy governance
already spanning every node. The agentic layer does not need new infrastructure —
it is an orchestration pattern over the capabilities that already exist.
| Node | Role | Live adapters | Contributes to an agent loop |
|---|---|---|---|
| primary | orchestrator / dev brain | git github ci conformance composition planning scout radicle release · huggingface local_llm vllm tei smolagents · safety delegation audit secrets host | drives the loop; reads/writes code; verifies; commits; governs |
| inference (Apple-Silicon Mac) | model serving | huggingface local_llm vllm tei · filesystem audit host | runs the model that does the reasoning/generation |
| worker (Mac) | compute | process jobs http filesystem audit host | parallel/batch execution, shelling out, fan-out jobs |
| nas (Synology) | storage | filesystem synology process | corpora, artifacts, datasets, build outputs |
Two properties make this more than "a bunch of microservices":
- Every capability is governable and evidenced.
safety.assessscores each invocation; high-risk actions (*host.update*,bash,delete, …) require approval;auditwrites a hash-chained, queryable record on each node, andreplay(correlation_id)/mesh auditreassemble a run across the fleet. - Routing is capacity-aware.
gateway.selection: least_loadedalready sends inference work to the node with the most GPU/CPU headroom; an agent loop gets this for free.
So an "agent" here is: a reasoning driver that calls CHP capabilities as tools, routed across nodes, with governance and an audit trail as first-class outputs.
2. The agentic layer
┌─────────────────────── primary (orchestrator) ───────────────────────┐
│ reasoning driver ──calls──▶ CHP capabilities (= the agent's tools) │
│ (smolagents.run / a CHP │ scout.* plan.* git.* github.* ci.* │
│ planning loop) │ conformance.* filesystem.* … │
└───────────────┬──────────────────────────┬───────────────────────────┘
│ gateway (capacity-routed, governed, evidenced)
┌─────────────┼──────────────┬───────────────────────┐
▼ ▼ ▼ ▼
inference worker nas (HF cloud)
local_llm/mlx jobs/process synology/fs Inference Providers
generate/chat batch fan-out corpora/artifacts spill / specialtyThe reasoning driver has two viable forms, both already present:
smolagents.run(on primary) — HF's code-first agent framework. We expose CHP capabilities to it as tools; smolagents handles the think→act→observe loop. Fastest path to a working agent.- A CHP-native loop —
planning.*for decomposition + explicit capability calls. More control, fully inside CHP's evidence model, no framework dependency.
Either way the model behind the reasoning is served on the inference node (see §4), and every tool call is a governed, evidenced capability invocation.
3. Flagship: "CHP develops CHP"
A self-hosted coding agent that improves this very repository, running entirely on your mesh. Each step is an existing capability on a specific node:
| Step | Capability | Node | Notes |
|---|---|---|---|
| 1. Find work | scout.* (analyze repo, surface hotspots/gaps) | primary | reads the tree |
| 2. Plan | planning.* / composition.* | primary | decompose into a change set |
| 3. Generate | local_llm.chat / mlx.chat | inference | the model writes the diff |
| 4. Apply | filesystem.write_file | primary | within an allowed path |
| 5. Verify | ci.run + conformance.* | primary | tests + protocol conformance gate |
| 6. Govern | safety.assess on each write/commit | primary | high-risk steps need approval |
| 7. Commit | git.* → github.* (PR) | primary | on a branch, never default |
| 8. Record | audit on every node + replay(corr_id) | fleet | full provenance of the run |
This is differentiated precisely because of steps 6 and 8: it is not "an agent that edits code," it is an agent whose every action is risk-assessed and recorded in a tamper-evident chain you can replay. That is the CHP thesis, dogfooded.
Guardrails to build in from the start: branch-only commits, an allowed-path
filesystem policy, conformance as a hard gate before any PR, and safety set to
require approval for git push/github actions.
4. Serving the model — pull via HF, serve via MLX (new adapter)
Decision: pull the model through the HF adapter, serve it with MLX on the inference Mac (MLX is ~2–3× faster than Ollama and ~1.5× faster than llama.cpp on Apple Silicon — but macOS-only).
4a. Pull (already supported)
huggingface.search_models/leaderboard_scores→ choose a Qwen3-family model (Apache-2.0; a dense ~4B–35B variant fits a Mac; MoE for headroom).huggingface.pull_for_local_llm(andquantize_to_ggufif a GGUF runtime is used) → fetch weights to the inference node's HF cache. Routed withprefer="inference".
4b. Serve — chp-adapter-mlx (to build)
MLX needs an OpenAI-compatible shim to fit our inference adapter contract. New
adapter chp-adapter-mlx:
- Wraps
mlx-lm(mlx_lm.serverexposes an OpenAI-compatible/v1/...), or callsmlx_lmin-process. - Registers
chp.adapters.mlx.generate/.chat/.list_models— the same shape aslocal_llm/vllmso the gateway treats it as another inference owner and the capacity router (GPU-utilization-aware) can pick it. - Tagged so
_INFERENCE_HINTSin the router recognizes it (mlxadded to the hint list) → GPU-aware routing applies. - Config via env (
MLX_MODEL,MLX_BASE_URL) mirroring the local_llm adapter.
This keeps the abstraction clean: the agent calls *.chat; the mesh decides
whether MLX (local), vLLM, Ollama, or a cloud provider serves it.
5. The broader Hugging Face surface (beyond local models)
HF in 2026 is a registry plus libraries plus a hosted inference layer (2.4M+ models, 730K+ datasets, ~1M Spaces). Map of what we can draw on and how it plugs into the mesh:
| HF resource | What it gives us | Our hook | Use in the agentic mesh |
|---|---|---|---|
| Models / Hub | open weights (Qwen3, …) | huggingface.pull*, search_models, model_card | the local reasoning model (§4) |
| Inference Providers | OpenAI-compat API routed to Together/Cerebras/Groq/… (fastest/cheapest) | llm.* / http.* (small adapter possible) | spill backend: when the inference GPU is saturated (mesh stats), route to cloud; or use a frontier model for hard steps the local model can't do |
| Datasets | 730K+ datasets | huggingface.load_dataset, search_datasets, dataset_preview | eval sets for the agent's self-tests; RAG corpora staged on the NAS |
| Spaces | ~1M hosted demo apps (Gradio/Streamlit) | huggingface.call_space | call specialty models (ASR, image/video gen, OCR) without hosting them |
| TEI / TGI / vLLM | production serving runtimes | tei.*, vllm.* | embeddings (tei.embed/rerank) for RAG; vLLM for non-Apple GPUs |
| smolagents | code-first agent framework | smolagents.run | the reasoning driver (§2) |
| PEFT / AutoTrain | LoRA/QLoRA fine-tuning | huggingface.finetune, apply_adapter | fine-tune a small model on CHP's own specs/codebase for sharper CHP-specific generation |
| Leaderboards | model rankings by task | huggingface.leaderboard_scores | pick the right model per job, automatically |
| FAISS / embeddings | vector search | huggingface.faiss_index, tei.embed, vector.* | the retrieval half of distributed RAG |
Two of these are strategically interesting beyond the flagship:
- Inference Providers as a capacity spill. The capacity router already knows GPU headroom; the natural extension is a policy: prefer local MLX; spill to an HF Inference Provider when GPU utilization is high or the task needs a frontier model. Local-first, cloud-burst — governed and evidenced like everything else.
- Fine-tune on ourselves.
huggingface.finetune+apply_adapteron the CHP spec/wire-protocol corpus → a LoRA that makes the local model materially better at generating conformant CHP code. The "CHP develops CHP" loop then improves its own generator.
6. Why this is differentiated
Plenty of stacks can run an agent that edits code or answers from a corpus. The mesh's distinction is operational, not model-quality:
- Governed — every action is risk-assessed; high-risk actions gated by policy, not vibes.
- Evidenced — a tamper-evident, replayable, fleet-wide record of exactly what the agent did and why. Auditable autonomy.
- Capacity-aware & portable — the same agent runs against local MLX, an on-prem GPU, or a cloud provider; the mesh routes by headroom and policy. No code change.
- Composable across owned hardware — storage (NAS), compute (worker), inference (GPU Mac), orchestration (primary) compose into one system you own end-to-end.
7. Build order
- Keystone — model serving. Build
chp-adapter-mlx(OpenAI-compat overmlx-lm), addmlxto the router's inference hints, deploy to the inference node, pull a Qwen3 model viahuggingface.pull_for_local_llm. Verifymlx.chatroutes withprefer="inference"and shows up in capacity routing. - Reasoning driver. Wire
smolagents.run(or a CHP-native planning loop) to a curated tool set (scout,filesystem,git,ci,conformance), pointed at the local model. - Flagship loop. Assemble the §3 pipeline behind one entrypoint
(
examples/chp-develops-chp/), branch-only + conformance-gated + safety-gated, withreplayof each run as the deliverable artifact. - Extensions. Inference-Provider spill policy; LoRA fine-tune on the CHP corpus; distributed RAG over NAS docs as a second demo.
Each step ships the usual way (tests → sync → chp-core precommit → push → wheel),
and the inference-node pieces (MLX install, model pull) are verified live over the
mesh — no SSH, via host.stats + the inference adapters.