Premise
This blog is a live experiment in human–AI co-authorship. Most posts here are drafted, edited, or revisited by AI agents under my direction. Every revision the system can observe is logged: the model that made it, when it landed, what changed, and (when captured) the full session trace that produced it.
Some posts are mine alone — hand-written, untouched by AI. The two are kept visually distinct. The rest of this page is the dashboard of the AI side.
At a glance
Authors
Every author who has touched a post — humans in green, models in violet/blue. Hover a row to see which posts they edited and when.
- Opus 4.7 23 last Apr 27Posts touched (14)
- Lifting Auto-Research
- Convergence as a first-class eval primitive
- The ensemble and the edit
- Teaching Agents to Improve Themselves
- RL Without Gradients
- Sandboxes All the Way Down
- +8 more
- Opus 4.6 11 last Mar 18Posts touched (11)
- Teaching Agents to Improve Themselves
- RL Without Gradients
- Sandboxes All the Way Down
- Multi-Agent Orchestration with Convergence Loops
- Anatomy of an Autonomous Security Audit
- Vibecoding a Browser Agent
- +5 more
Most edited posts
Posts the loop keeps coming back to. Hover the row to see the revision timeline. Each dot is one edit, colored by model, plotted on the post's lifetime.
- 6 The ensemble and the edit Apr 23 → Apr 24
- 4 Teaching Agents to Improve Themselves Mar 18 → Apr 24
- 3 Convergence as a first-class eval primitive Apr 24 → Apr 24
- 2 RL Without Gradients Mar 16 → Apr 23
- 2 Sandboxes All the Way Down Mar 15 → Apr 23
- 2 Multi-Agent Orchestration with Convergence Loops Mar 14 → Apr 23
- 2 Anatomy of an Autonomous Security Audit Mar 13 → Apr 23
- 2 Vibecoding a Browser Agent Mar 11 → Apr 23
- 2 Convergence in Multi-Agent Review Loops Mar 10 → Apr 23
- 2 Building a Browser Agent That Doesn't Get Stuck Mar 7 → Apr 23
- 2 The Expected Cost of Fallback Chains Mar 5 → Apr 23
- 2 Exploit-or-Disprove: Adversarial Validation of Security Findings Mar 3 → Apr 23
Recent activity
The last 20 edits the system observed. Open the full trace log →
- captured session · 2 asst turns · 1 tool calls
- initial draft — StatGrid baseline, Scorecard for audit scenario, three-layer scoring, monotonicity, resumable runs
- polish pass: Sidenote on PoC realism; closing rewritten from meta-commentary to concrete per-criterion diagnosis
- diagram fix: legend moved below plot area so labels no longer cut off
- polish pass: stray tex formula removed, narrator-smoothing example added showing three debugging cycles compressed into one clause
- closing tightened to the concrete shift — one overnight run, six failure modes classified, four fixed, $30 of compute replacing two weeks of diagnostic work
- 10 asst turns, 9 tool calls captured
- draft — first framing around linear extensions, rejected by operator as muddled
- full rewrite — scrapped order-theory angle, reframed around shadcn-chat baseline and ensemble-vs-edit as real options
- richness pass: built ChatMock component, added four rendered mockups inline (baseline / ensemble / edit / combined)
- ChatMock redesigned: replaced wireframe transcript with rounded-card layout, rounded-lg tool pills with SVG icon badges, run_group collapsible containers
- 5 asst turns, 5 tool calls captured
- composition diagram redrawn: /evolve becomes the outer container wrapping the three commands; phase track nested inside; switched palette to current B&W + action-color vocabulary
- 3 asst turns, 3 tool calls captured
- 2 asst turns, 2 tool calls captured
- 2 asst turns, 2 tool calls captured
- 4 asst turns, 2 tool calls captured
- 2 asst turns, 2 tool calls captured
- 1 asst turns, 1 tool calls captured
- 2 asst turns, 2 tool calls captured
The rules
- Any post without
original: truein its frontmatter - Frontmatter (title, description, tags) when revising content
- Components in
src/components/and layouts - The trace-capture pipeline itself
- Any post with
original: true— hard rule, in CLAUDE.md - The list of authors on a human post
- Revision logs older than the agent's session start
Methodology
How the dashboard above is built.
- 1 Capture
A post-commit hook checks whether any
src/content/posts/*.mdxfile changed. If so, the script reads the active session log (Claude Code or Codex), extracts the model, timestamp, and a one-line change note, and appends a revision entry to the post's frontmatter with atrace_id. - 2 Trace store
Each captured session is written to
src/content/traces/<trace_id>.json— the full message stream, tool calls, and artifacts. The traces collection is loaded by Astro's content layer at build time. - 3 Reconstruction
Posts written before the capture pipeline existed have a single
reconstructed: truerevision back-filled from the git history. Models for those entries are best-guess, marked clearly in the UI. - 4 Aggregation
This page reads every post's frontmatter at build time, joins it with the trace store, and renders the dashboard. There is no database. Everything you see here is in the repo.