The keeper's duties, in full.
The evidence commands no other terminal agent ships, the 27-tool belt, how it reasons, the slash commands worth knowing, and the security model, stated plainly.
the evidence commands
No model in the loop. Safe anywhere, scriptable everywhere.
- tomte prove
- Done means verified
- tomte why src/parser.rs:88
- It remembers why, across models
- tomte why-context classify_danger
- It knows the house
- tomte race "fix the flaky retry test" --agents 4
- Don't trust one agent. Race them
- tomte pulse
- Repo Pulse
- tomte handoff --out HANDOFF.md
- The Handoff capsule
Each takes --json for scripts. tomte prove exits non-zero on a failing check, so it gates a commit hook or CI step.
the tool belt
27 tools, streamed and schema-validated.
Files
Read and edit with stale-file guards that refuse a write when a file changed since it was last read, plus a one-step undo.
Search
Regex search, glob, and language-aware code intelligence without setup.
Shell
Run commands with a destructive-command guard, plus background shells you can poll and kill.
Web
Fetch and search the web behind an SSRF guard with a response-size cap.
Flow
Track todos with dependencies, hold an active goal, wait, and move in and out of plan mode.
Agents
Dispatch sub-agents, ask the user, and invoke skills.
Memory
Record why a non-obvious change was made, then read it back across sessions and model switches, plus project-scoped notes that persist.
Git worktrees
Branch the session into a throwaway worktree and clean it up safely.
Notebooks
Edit Jupyter notebook cells with the same stale-file guard as files.
Stale-file guards refuse a write when a file changed since the model last read it. Destructive shell commands are flagged for confirmation, and incomplete streamed tool calls are dropped rather than executed with half-finished arguments.
Reasoning effort, dialled in.
Choose how hard the model thinks, for the session or a single turn. The newest Claude models use adaptive thinking; OpenAI maps the same levels to its reasoning effort.
Set it with tomte config --set-reasoning high, or /thinking inside the session.
Slash commands worth knowing.
Evidence
- /prove
- Run the project's own checks and show the proof capsule.
- /twin
- The repo's five verifiable indexes, built and cached.
- /why-context <seed>
- Which files belong in context for a file or symbol, and why.
- /pulse
- The files most likely to break next, formula on the card.
- /handoff
- The paste-ready shift report for the next session.
The trail
- /why
- Read the decision trail: why past changes were made.
- /blame <file>
- One decision per line for a single file.
- /rewind
- Restore an earlier turn and revert the edits made since.
Spend and context
- /usage
- Live provider quota and rate-limit snapshot.
- /cost
- Per-model token tally and estimated dollars, cache-aware.
- /context
- Context-window usage and where the tokens are going.
- /compact <focus>
- Compact the conversation, steering what the summary keeps.
Session
- /model
- Switch the active model mid-session.
- /resume
- Pick a previous session and continue it.
- /plan
- Enter read-only plan mode before acting.
- /buddy
- Hatch the pixel companion, or reset and hide it.
Composer prefixes
Three characters you type at the start of a line: quick inline actions without leaving the composer.
- @path
- Attach a file or directory listing with a gitignore-aware typeahead.
- !command
- Run a shell command immediately, no model turn. Output feeds the next message.
- #note
- Append a note to the project CLAUDE.md and re-apply memory to the live session.
The security model, stated plainly.
run_shell runs inside an OS-level sandbox, confined to the workspace with the network off by default. On Windows that confinement is best-effort, so review destructive prompts there. Here is the rest of what tomte guards.
Commands run in an OS sandbox
run_shell runs inside an OS-level sandbox: Landlock and seccomp on Linux, sandbox-exec on macOS, confining writes to the workspace with outbound network off by default. On Windows it is best-effort process-tree cleanup only, so review destructive prompts there. On top of that, tomte flags obvious destructive commands like rm -rf on home or system paths, curl piped to a shell, mkfs, and force-pushes, and refuses them until you explicitly override.
Secrets stay out of the shell
Environment variables that look like secrets, with names containing TOKEN, SECRET, KEY, OPENAI, AWS, or GITHUB, plus connection strings and vendor prefixes, are stripped from child processes so the model cannot read them back.
Writes are guarded
Stale-file guards refuse a write when a file changed since the model last read it. auto_approve_write is false by default, and sub-agents inherit the parent approval policy.
Credentials are owner-only
OAuth uses PKCE and refreshes automatically. Tokens are written with owner-only permissions on Unix and an owner-only ACL on Windows. Project permission allow-lists reject symlinked paths so an allow decision cannot be redirected.
Questions.
What does tomte do that other coding agents do not?
Four things ship together here and nowhere else: a proof capsule built from your project's own checks (the model cannot fabricate it), a decision trail that survives switching models mid-project, a verifiable map of the repository that answers which files belong in context and why, and an agent tournament judged deterministically on measured evidence. Pulse and Handoff compose those indexes into a risk card and a shift report.
Will my Claude Code or Codex setup work?
Yes. tomte keeps the muscle memory: a terminal UI, slash commands, plan mode, composer prefixes, and inherited AGENTS.md and CLAUDE.md memory. Your existing skills and sub-agents are discovered automatically.
Do I need an API key?
No. You can sign in with a ChatGPT or Claude subscription over OAuth. API keys also work and unlock the full model catalogue. Environment keys are picked up automatically.
Which providers and models are supported?
The OpenAI GPT-5 family, Anthropic's Claude Fable 5 and Claude 4 families, and any OpenAI-compatible endpoint including local Ollama and LM Studio. See the Models page for the current catalogue.
Is my code sent anywhere, and is it sandboxed?
Your prompts and the files the agent reads go to the provider you choose, the same as any coding assistant. run_shell runs inside an OS-level sandbox (Landlock and seccomp on Linux, sandbox-exec on macOS; default workspace-write with outbound network off), and tomte flags obvious destructive commands on top. On Windows the sandbox is best-effort process cleanup only, so review destructive commands there.
What platforms run it?
Prebuilt binaries cover Linux x86-64, macOS on Intel and Apple Silicon, and Windows x86-64. You can also build from source with stable Rust.
Why is it called tomte?
The tomte is the Nordic farm spirit who keeps the household in order overnight: meticulous, quiet, and intolerant of sloppy work. It also hatches a pixel companion in the corner of the terminal, because a night watch is better with company.
Put the keeper in your terminal.
One binary, then sign in. The full agent in under a minute.