The Hermes agent · operations build

Hermes Agent features,
with proof and control.

Run Hermes-powered crews that browse, search, inspect a codebase, call Python, and analyze media. Hermes Loop wraps every run in a triage inbox, a persistent job queue, approval gates, and a hashed receipt that proves what happened.

Learning loop — every settled mission distils into reusable Skill rows on the same crew.
Cross-session memory recall — operator-approved context queried across all prior runs, injected into the next one.
MCP integration — native Model Context Protocol client; remote tools surface alongside built-ins.

Open the control room →How it works Runtime tools Search / media tools

Surfaces

Crews shipped

Tools wired

Auto-sent

Mission control3 active

desk · live

MS-2034Paper Trading

WAITING

AAPL/NVDA — weekly paper trade

MS-2033Bug Hunter

RUNNING

Audit demo.shop checkout flow

MS-2031Life Admin

COMPLETED

Dispute refund · order A4827

Agent logMS-2034

00:01.220market-scoutcompleted

00:01.880newscompleted

waiting on user approval…

Tool callslast 60s

explorer→ web_snapshot410ms

backtest→ price_series_lookup320ms

evidence→ document_extract180ms

follow-up→ deadline_createneeds ✓

Approval queue2 pending

EMAIL_DRAFT

Refund request — order A4827

TRADE_SIMULATION

AAPL · BUY pullback

Simulation only — no order will be placed.

AAPLpaper

BUY · pullback to 20-DMA

Entry

214.20

Stop

206.70

Target

227.05

conf 0.62 · risk 0.42awaits approval

Triage inbox·Persistent job queue·Governed memory·Tool execution layer·Approval queue·Workflow receipts·Trust ledger·Scheduled runs·Hermes-native·Local demo mode·Zod-validated·

Hermes Agent features wired here

Not just a dashboard. Real agent capabilities with receipts.

Hermes does the reasoning. Hermes Loop wires the runtime around it: browser, terminal, Python, search/media providers, schedules, webhooks, approvals, memory, evals, and proof.

Browser automation

Shipped

Playwright QA audits URLs with screenshots, console capture, accessibility and layout checks.

Terminal tool

Shipped

terminal_exec runs safe repo diagnostics and approval-gates risky commands.

Python RPC

Shipped

python_rpc runs short scripts in an ephemeral workspace with policy checks.

Web search

Shipped

web_search live via Tavily, with Brave + SerpAPI as fallback providers.

Vision

Shipped

vision_analyze on Gemini 2.5 Flash via Hermes — image URL or base64 → structured findings.

Image + TTS

Shipped

image_generate uses Hermes image-capable models (Gemini, Flux). text_to_speech uses ElevenLabs eleven_multilingual_v2.

NL scheduling

Shipped

Hermes parses plain-English schedules into the existing cadence engine.

Webhooks

Shipped

Signed inbound webhook turns external messages into InboxItems and triage jobs.

Multi-model routing

Shipped

Per-role routing wired: HERMES_MODEL_FAST / STRONG / JUDGE / VISION. Unset roles fall back to default; receipt records fallbackUsed=true.

Learning loop

Shipped

After every mission settles, Hermes Loop distils up to 3 reusable lessons into Skill rows. Future runs of the same crew inject them automatically.

Cross-session memory

Shipped

Operator-approved memory queried across all prior sessions, top matches injected into the Triage Agent + first-step prompts.

MCP integration

Shipped

Native MCP (Model Context Protocol) client. Configure MCP_SERVERS and remote tools surface alongside built-ins.

Test search / media Test runtime Full parity board

Plain English

The control room for the Hermes Agent.

Hermes Agent is Nous Research's open-source autonomous agent — it lives on your server, remembers what it learns, and gets more capable the longer it runs. Hermes Loop is the operator surface: you launch missions, run named crews of subagents, gate risky outputs with approvals, and produce a hashed receipt that proves what happened.

What Hermes Agent gives you

The autonomous engine: persistent memory, subagents with their own conversations and terminals, natural-language cron, and native tools (web search, browser automation, vision, multi-model reasoning).

What Hermes Loop adds on top

The governance layer: crews, a job queue, approvals, hashed receipts, a trust ledger, evals, schema self-correction, real-cost accounting, and a full audit trail.

Why it is different

A chatbot gives an answer. The Hermes Agent does the work autonomously; Hermes Loop proves it: who ran, which tools were called, which memory was used, what was approved, and what hash signed off the run.

What to try first

Run Bug Hunter on /demo-target, approve the QA report, generate the receipt, then open Trust to see the run roll up into risk and proof.

Concrete uses

What people can actually do with it

Audit a website

Bug Hunter opens the page in Playwright, captures evidence, finds accessibility/conversion issues, and stages a QA report for approval.

Open workflow ->

Inspect a repo

Codebase Debugger lets Hermes use governed terminal commands, then writes a diagnosis without editing files.

Open workflow ->

Search or analyze media

Provider-backed web search, vision, image generation, and TTS are testable from the Media section before agents use them.

Open workflow ->

Schedule or trigger work

Use natural-language schedules or a signed webhook to turn external requests into queued agent missions.

Open workflow ->

Create your own crew

Define agents, roles, instructions, tools, and execution order, then run that crew like a built-in workflow.

Open workflow ->

Prove what happened

Receipts and Trust show the timeline, tools, approvals, memory, risk, model cost source, and integrity hash.

Open workflow ->

02 — Crews

Crews that
actually work.

Three pre-built crews ship in the box. Each is a sequence of specialists with strict output schemas and a sample deliverable you can preview right now.

Markets6 agents

Paper Trading Desk

Research, thesis, and a paper-trade ticket — fully simulated.

01Market Scout— Scans the watchlist and produces a market brief.

02News Agent— Summarizes relevant news and event flow.

03Strategy Agent— Builds a single, defensible trade thesis.

04Risk Agent— Calculates downside, invalidation, and sizing notes.

05Backtest Agent— Runs a simple historical simulation from seeded data.

06Paper Execution Agent— Records the simulated trade decision only.

Sample output

AAPL · BUY pullback to 20-DMA. Entry $214.20 · Stop $206.70 · Target $227.05. Confidence 0.62. Simulation only — awaits approval.

Launch this crew →paper-trading

Quality6 agents

Bug Hunter Crew

Crawl, test flows, audit a11y, and ship a client-ready report.

01Explorer Agent— Maps pages and key flows via real browser audit.

02Flow Tester Agent— Tests signup/contact/checkout-style flows.

03Accessibility Agent— Checks labels, contrast notes, keyboard risks.

04Copy Agent— Flags confusing or weak copy.

05Bug Reporter Agent— Creates recommended fixes for each issue.

06Report Agent— Drafts a client-ready QA report summary.

Sample output

4 issues across 4 pages. 2 high-severity (signup duplicates on double-click; missing form labels). Recommended fixes attached. Awaits export approval.

Launch this crew →bug-hunter

Personal6 agents

Life Admin Crew

Evidence, policy angle, draft, and a follow-up plan.

01Intake Agent— Identifies the task and required outcome.

02Evidence Agent— Extracts dates, amounts, names, order numbers from docs.

03Policy Agent— Creates a rights/policy angle.

04Draft Agent— Writes the email/message/script.

05Critic Agent— Checks missing info and weak points.

06Follow-up Agent— Creates reminders and next steps.

Sample output

Refund draft for order A4827 — firm + polite. 7-business-day window. Follow-up plan at +3d / +7d / +14d. Nothing sent.

Launch this crew →life-admin

Engineering5 agents

Codebase Debugger Crew

Repo scout · build runner · error analyst · fix planner · report.

01Repo Scout— Maps the workspace and gathers cheap diagnostic context.

02Build Runner— Runs typecheck / lint / build and captures the failure surface.

03Error Analyst— Groups errors by root cause and identifies the load-bearing files.

04Fix Planner— Drafts a concrete fix plan per root cause. Read-only — no edits.

05Report Agent— Wraps findings + plan into a deliverable.

Sample output

Launch this crew →codebase-debugger

03 — Mission lifecycle

From prompt to
completed mission.

Six explicit stages. Each one writes to the audit trail. Nothing jumps a stage; nothing leaves the desk on its own.

01OBJECTIVE

Brief the crew

Pick a template, type an objective, attach optional context.

02CREW

Specialists run in sequence

Each agent returns Zod-validated JSON. Outputs feed forward.

03TOOLS

Tools called inline

Agents request tools by name. Results inject into the next prompt.

04AUDIT

Everything logged

Prompts, responses, tool I/O, latency, tokens, system events.

05APPROVAL

Risky steps queued

Drafts, trade tickets, exports, follow-ups land in your inbox.

06DELIVERABLE

Shipped + traceable

Final artefact stored, indexed, and reviewable forever.

04 — Tool execution layer

Tools agents can
actually call.

Agents request tools by name. The desk validates input, runs the tool, persists the call, and feeds the result into the next prompt. Every tool is sandboxed — no shell, no live trading, no auto-send.

QA & researchtool

web_snapshot

Fetches a public URL — title, description, headings, sample links, text sample. Blocks localhost, private IPs, file URLs, and metadata endpoints. 8s timeout, 1 MB body cap.

Marketstool

price_series_lookup

Returns deterministic synthetic OHLCV — never real broker data. Used by the Backtest Agent to compute win rate and drawdown.

Life admintool

document_extract

Pulls dates, amounts, names, companies, order numbers, and key claims from mission-owned documents only.

Follow-upstool

deadline_create

Creates an approvable follow-up reminder. Never sends an email or schedules an external action.

Exportstool

report_export_draft

Renders a deliverable to MARKDOWN or JSON. Approval-gated by default — the export is staged, not shared.

Safetyguard

What tools can't do

· No shell execution
· No browser form submission
· No live trading or brokerage APIs
· No outbound emails or messages
· web_snapshot blocks localhost + private IPs
· Tool I/O is persisted for audit

05 — Persistence

Persistent by design.

A chat window forgets. A desk remembers. Schedules, replays, run history, and the approval inbox give you continuity instead of a transcript.

Scheduled missions3 active

Paper Trading

Mega-cap weekday open

Cadence

Mon–Fri 08:00

Next run

Tomorrow 08:00

Bug Hunter

Audit checkout flow

Cadence

Every Mon 09:00

Next run

Mon 09:00

Life Admin

Refund follow-up sweep

Cadence

+3d after draft

Next run

in 2d

auto-runs into the same orchestratorSee schedules →

core

Scheduled missions

Daily, weekdays, weekly, or monthly cadences. Each fire materializes a fresh mission and runs the orchestrator.

core

Replay any run

Animate the agent + tool timeline of any completed mission. Useful for review, demos, and onboarding.

core

Run history

Every step is keyed to its mission. Tokens, latency, prompts, raw responses, parsed outputs — all retained.

core

Approval inbox

Drafts, trade tickets, exports, reminders, and tool gates land here. Approve, reject, or request changes.

06 — Trust layer

Built for control.

Auditability isn't a setting — it's the product. Every prompt is on disk. Every tool call is on disk. Every risky action waits for a human signature. Every deliverable traces back to the agent step that made it.

Raw prompt + raw response stored on every step
Zod schema validation on every parsed output
Tool input + output persisted with timing + status
Approval queue for trades, drafts, exports, follow-ups, tool gates
Per-mission audit log of agent + tool + system events
Demo mode runs the full pipeline with no external calls

audit · mission MS-2034 · paper-trading

live

00:00.142

system

mission.started

Paper Trading Desk · AAPL/NVDA

00:01.220

market-scout

agent.completed

Market brief · 3 symbols

00:01.880

news

agent.completed

3 highlights extracted

00:02.610

strategy

agent.completed

Thesis: long AAPL pullback

00:03.040

risk

agent.completed

4 risks logged · score 0.42

00:03.520

backtest

tool.started

price_series_lookup AAPL · 60d

00:03.840

backtest

tool.completed

60 bars · synthetic-demo

00:04.020

backtest

agent.completed

win rate 0.55 · DD -7.8%

00:04.510

paper-execution

approval.queued

Trade ticket queued

Open the desk

Your operations terminal,
running locally.

Spin up the dashboard, pick a crew, type an objective, and watch six agents work — calling tools, queuing approvals, writing audit events, shipping a deliverable. Plug in Hermes for live runs or stay in demo mode — the interface never changes.

Launch command center →

Quickstart~3 min

1Open /dashboard
2Pick a crew (or seed the demo workspace)
3Type an objective
4Click Run mission
5Watch agents + tools execute
6Approve the queued items

Hermes Agent features,with proof and control.

Not just a dashboard. Real agent capabilities with receipts.

The control room for the Hermes Agent.

What people can actually do with it

Crews thatactually work.

Paper Trading Desk

Bug Hunter Crew

Life Admin Crew

Codebase Debugger Crew

From prompt tocompleted mission.

Tools agents canactually call.

Persistent by design.

Built for control.

Your operations terminal,running locally.

Hermes Agent features,
with proof and control.

Crews that
actually work.

From prompt to
completed mission.

Tools agents can
actually call.

Your operations terminal,
running locally.