Hermes Architecture | DeepHarness

What’s new

DeepHarness v0.3.0 introduces eight infrastructure patterns that make the AI system deeper, more resilient, and more transparent.

Multi-model reasoning

Hard questions no longer go to a single model. The Mixture-of-Agents engine runs your query through three AI models in parallel — Claude Sonnet, Gemini Pro, GPT-4.1 — then synthesises their responses through a dedicated aggregator. Areas of agreement are reinforced. Contradictions are resolved. You get one answer, informed by three perspectives.

Context compression

Long conversations no longer degrade. A two-phase compressor prunes verbose tool outputs and summarises middle turns, preserving your most recent exchanges and original intent verbatim. Conversations scale to hundreds of turns without losing context.

Credential pool rotation

API key management is now automatic. Multiple OpenRouter keys rotate using least-used or round-robin strategies, with automatic 429 backoff and permanent dead-key marking for revoked credentials. Zero-downtime key rotation requires no configuration changes.

Keyword pre-filter

Simple messages like “hello” or “thanks” now skip the LLM-based complexity classifier entirely, routing straight to the fast tier. A deterministic keyword check runs in microseconds, saving the cost of a classification call on 30-40% of queries.

Guided playbooks

Three built-in playbooks — Dashboard Setup, Data Source Connection, Chart Builder — guide users through multi-step workflows as structured conversations. Steps collect data, validate inputs, and allow skipping. Custom playbooks can be defined for any domain.

Delegation sandbox

When one agent delegates to another, the handoff happens inside a sandbox with enforced timeouts (30s), depth limits (2 levels), concurrency caps (3 parallel), and output truncation. Agents act freely within boundaries you define.

Trajectory recording

Every routing decision — intent classification, agent selection, model routing, delegation, compression — is recorded as a traceable trajectory. Full observability into how the AI arrived at its answer.

Session search

Full-text search across conversation history with keyword matching, agent filtering, and time-range queries. Find anything you discussed, instantly.

Infrastructure improvements

Advisory agents (product and engineering) now have full system prompt definitions and integrate with the MoA engine for complex strategic queries.
Cost routing respects the pre-filter, skipping the classifier for trivially simple messages.
OpenRouter provider creates fresh instances per call for proper key rotation.