
Get AI income methods before they spread.
Free weekly intelligence newsletter.
Hermes Agent cut one user's AI token spend from $130 to $10 per five-day period — a 92% reduction. The system combines SQLite memory that persists across sessions, OpenRouter for smart model routing, and a DRY pattern that converts recurring tasks into pure code once they have been proven to work. No LLM call needed for tasks that have already been solved.
What Problems Does Hermes Solve That Standard AI Agents Do Not?
Standard AI agents have three problems at scale. They forget everything between sessions. They route all requests through expensive models regardless of task complexity. And they repeat the same LLM inference for tasks that run identically every time.
Hermes addresses all three directly:
- SQLite memory — every successful task completion is written to a local database. Hermes recalls relevant context on demand at the start of each session.
- OpenRouter routing — cheap models handle deterministic tasks; capable models handle tasks that require reasoning. The routing decision is automatic.
- DRY pattern — once a recurring task runs successfully, Hermes writes pure code for it. The next time that task runs, the code executes directly with no LLM call.
How Does the DRY Pattern Actually Work?
DRY stands for Don't Repeat Yourself — a programming principle applied to LLM inference. The logic: if a task produces the same output every time it runs, there is no reason to use an LLM for it. The LLM was useful for figuring out the solution once. After that, code is faster, cheaper, and more reliable.
When Hermes successfully completes a recurring task — generating a daily report, formatting a specific data structure, processing a webhook — it writes the solution as a Python function and stores it. The next invocation of that task runs the function directly.
The DRY pattern is what drives the 90% cost reduction at scale.
Over time, your most common tasks become code. LLM calls are reserved for genuinely novel tasks that require reasoning.
How Does the SQLite Memory System Work in Practice?
At the end of each session, Hermes writes a structured summary to a local SQLite database: what task was completed, what approach worked, what context was relevant. At the start of the next session, Hermes queries the database for relevant prior work and loads it automatically.
This means a task you completed three weeks ago — along with the exact approach that worked — is available in the next session without you explaining it again. For an agent working across multiple projects or clients, this is the difference between starting fresh every session and building on accumulated knowledge.
How Do You Set Up Hermes With OpenRouter?
Install Hermes from the repository linked below. Configure your OpenRouter API key in the .env file. Define your model routing rules: which model handles which task types. Start Hermes and run your first task — a simple data processing or summarisation job.
The first session is mostly learning. Hermes figures out the task, completes it, and writes it to memory. The second session for the same task is faster. By the fifth session, recurring tasks are running as pure code.
Connect to Obsidian and Telegram for the full setup Greg demos — Obsidian as the knowledge base, Telegram as the interface for triggering tasks remotely.
Frequently Asked Questions
Does Hermes work without OpenRouter?
Yes, but without model routing you lose the cost optimisation that drives most of the savings. You can configure Hermes to use Anthropic's API directly — it will still benefit from SQLite memory and the DRY pattern, just at standard Anthropic pricing. Add OpenRouter when you are ready to optimise costs.
What kinds of tasks convert to pure code via the DRY pattern?
Any task that produces consistent, predictable output from consistent input. Data formatting, report generation, email parsing, webhook processing, scheduled data pulls, and structured content transformation all qualify. Tasks that require judgment — evaluating a novel situation, making a decision with multiple valid answers — stay as LLM calls.
How does Tailscale fit into the Hermes setup?
Tailscale creates a private network between your devices. With Hermes running on a home server or VPS behind Tailscale, you can trigger tasks from your phone via Telegram without exposing the server to the public internet. It is an optional addition for remote access, not a core requirement.
Is the $130 to $10 saving realistic for typical usage?
It depends on your task mix. If most of your agent usage is recurring, well-defined tasks, the DRY pattern will drive significant savings. If most tasks require novel reasoning each time, the savings are smaller — maybe 30 to 50% from model routing alone. Greg's user who achieved 92% savings had a workflow dominated by recurring tasks that Hermes converted to pure code.
Watch Greg's full Hermes demo: Greg Isenberg on YouTube

Get AI income methods before they spread.
Free weekly intelligence newsletter.

The 6 Claude Code Skills Every AI Agency Should Install First