the night watch · v0.0.4 · MIT · Rust

Done means verified.

A calm, multi-model agent for your terminal. It maps the repository, remembers why, and never calls a thing done without evidence.

Install Source

Named for the Nordic farm spirit who keeps the household in order overnight: meticulous, quiet, and intolerant of sloppy work.

proof capsuleverified

tomte

$ tomte prove
Proof Capsule  ·  2026-06-09 10:58
  files changed   1 (M README.md)
  ✅ test       passed   cargo test
  ✅ typecheck  passed   cargo check
  ✅ lint       passed   cargo clippy
  ✅ build      passed   cargo build
  reproduce: cargo test && cargo clippy

The model never supplies these numbers. It cannot fabricate a green capsule, only explain one the CLI already collected.

what no other terminal agent ships together

Most agents tell you the work is done. This one shows receipts.

Done means verified

verified

/prove collects an evidence bundle the CLI gathers itself: the files git reports changed, plus the real exit codes of your project's own test, typecheck, lint, and build. tomte prove exits non-zero on failure, so it gates a commit hook or CI step.

tomte

$ tomte prove
Proof Capsule  ·  2026-06-09 10:58
  files changed   1 (M README.md)
  ✅ test       passed   cargo test
  ✅ typecheck  passed   cargo check
  ✅ lint       passed   cargo clippy
  ✅ build      passed   cargo build
  reproduce: cargo test && cargo clippy

The model never supplies these numbers. It cannot fabricate a green capsule, only explain one the CLI already collected.

It remembers why, across models

on record

record_decision appends the reasoning behind every non-obvious change to a decision trail that is re-injected each session. Next month's session, or a different model entirely, inherits the why and not just the diff. Drift Watch flags a decision the code has moved out from under.

tomte

$ tomte why src/parser.rs:88
src/parser.rs:88
  decision  empty input returns Err, not a panic
  why       a library must never crash its caller
  rejected  panic: crashes callers
  recorded  gpt-5.5 · turn 5 · anchor fresh

The trail is an append-only file in your project state. Overturning a decision is recorded as a supersede, never an erase.

It knows the house

mapped

tomte twin builds five verifiable indexes straight from the source: import graph, symbol graph, test-to-source map, git recency, and project conventions. tomte why-context answers the question context-stuffing agents dodge: which files actually belong in context, and why.

tomte

$ tomte why-context classify_danger
Context X-Ray for `classify_danger`
Selected (would pull into context):
  • tools/shell.rs
      because imports the seed [import]
  • race/judge.rs
      because judge.rs:6 references it [symbol]
Ignored (nearby but left out):
  • tools/web.rs — no path reaches it

Every claim is grounded in a real import edge, definition, test, or commit. A generic name cannot manufacture a false reference.

Don't trust one agent. Race them

measured

tomte race runs a task as a tournament: contestants varying model, effort, and style, each in its own isolated git worktree. The judge is deterministic and measures evidence: the project's own checks, diff size, added tests, risky commands run. An LLM is never the referee.

tomte

$ tomte race "fix the flaky retry test" --agents 4
🏁 4 contestants · isolated worktrees
  1. minimal-patch   ✅ verified · +test · 38 lines
  2. gpt-5.5/high    ✅ verified · 112 lines
  3. opus-4-8/max    ⚠ checks failed (lint)
  4. gpt-5.5/low     ✖ no change
  winner: minimal-patch (smallest verified diff)
  patch saved · apply with --apply

Ranking is tiered so a clever-but-broken patch can never beat a working one. Every reason on the card comes from measured numbers.

because the indexes are real data, they compose

The map answers questions you haven't asked yet.

Repo Pulse

Which files are most likely to break next, with the formula printed on the card: commits in the recent window, times import fan-in plus one, doubled when no test covers the file. Rerun it, get the same card, argue with the numbers.

tomte

$ tomte pulse
Repo Pulse — your/repo
  1. core/src/tools/mod.rs
     risk 124 = 31c × 2i × untested ⚠
  2. tui/app/types.rs
     risk 76 = 19c × 2i × untested ⚠
  hot & untested: 65 source files

The Handoff capsule

One paste-ready markdown capsule: git standing, the newest recorded decisions with a drift-watch line, the map summary, and the pulse top. Built for the next session, whether that is a colleague, tomorrow's you, or a different model entirely.

tomte

$ tomte handoff --out HANDOFF.md
# Handoff — your/repo
## Where the tree stands
- branch `0.0.4` · working tree clean
## Why things are the way they are
- `parser.rs:88` — Err, not panic
- drift watch: 4 hold · 2 healed · 1 needs eyes
_Before you call anything done: tomte prove._

the keeper's manner

Quiet habits, wrapped around the proofs.

Glass box, not black box

Before a write or shell command runs, one calm line states what it changes and how far it reaches. A file's recorded decisions surface as house rules so the agent re-reads its own constraints before it could break one.

An end-of-turn receipt

A turn that changes something closes with one line: files touched, tests run, and the why it recorded. The custodian leaves a note, every time.

A checkpoint every turn

/undo reverts the last file edit. /rewind restores the session to an earlier turn AND reverts the edits made since, each picker row showing its blast radius before you commit to it.

Quiet, surgical, cross-platform

One Rust binary on Linux, macOS, and Windows. No daemon, no telemetry, a terminal UI that stays out of the way, and a pixel companion that hatches from an egg if you want company.

one binary, any brain

The trail, the map, and the proofs survive a model switch.

Sign in with a subscription or an API key. Switch mid-session with /model. Everything tomte records is provider-agnostic, so the why written by one model is read by the next.

OpenAI

GPT-5 family

The GPT-5 family over the Responses and Chat Completions APIs. A ChatGPT Plus, Pro, Team, or Enterprise subscription signs in over OAuth. An API key unlocks the full public catalogue.

Anthropic

Claude families

Claude Fable 5 and the Claude 4 family over the Messages API, with adaptive thinking on the newest models. A Claude Pro or Max subscription signs in over OAuth after a terms acknowledgement.

OpenAI-compatible

Any endpoint

Groq, OpenRouter, DeepSeek, xAI, Together, Fireworks, Cerebras, Mistral, and local Ollama or LM Studio work out of the box as provider/model. Anything else: declare a base URL and key under providers in config.json.

the full catalogue

and the table stakes, done well

27 tools, zero daemons.

one binary

No daemon, no ceremony

A single tomte binary. Launch the full terminal UI or fire a one-shot from a script. Same agent either way, nothing running in the background.

your brain

Bring your own provider

Sign in with a ChatGPT or Claude subscription over OAuth, or paste an API key. Switch models mid-session. Add any OpenAI-compatible endpoint, local or hosted.

tool belt

A real tool belt, not a toy

Twenty-seven tools across files, shell, search, web, notebooks, sub-agents, memory, todos, and plan mode. Streamed, schema-validated, and run in parallel where it is safe.

lsp

Code intelligence, zero setup

The lsp tool gives symbols, go-to-definition, references, and hover for Rust, TypeScript, JavaScript, Python, and Go. No language server to install.

worktree

Experiment without fear

enter_worktree spins the session into an isolated git worktree. exit_worktree cleans it up after a safety check, so you never clobber main.

accounting

Knows what it is spending

/usage reads your provider's live quota. /cost tallies tokens and dollars per model, cache-aware. /context shows where the window is going.

failover

Stays up when a provider does not

List fallback models and a rate-limit or overload transparently switches the turn to the next one and keeps going, instead of failing mid-task. Off by default.

memory

Inherits your existing setup

AGENTS.md and CLAUDE.md from the git root down to your working directory fold into the system prompt. Existing skills and sub-agents are discovered automatically.

the full field guide

Let the keeper take the night shift.

Install tomtetomte prove