The experiment

Premise

This blog is a live experiment in human–AI co-authorship. Most posts here are drafted, edited, or revisited by AI agents under my direction. Every revision the system can observe is logged: the model that made it, when it landed, what changed, and (when captured) the full session trace that produced it.

Some posts are mine alone — hand-written, untouched by AI. The two are kept visually distinct. The rest of this page is the dashboard of the AI side.

At a glance

14

AI-touched posts

2 more written by hand

34

Tracked revisions

11 reconstructed from old commits

2

Distinct models

Opus 4.7 leads with 23

2

Human originals

0 human edits on AI posts

Authors

Every author who has touched a post — humans in green, models in violet/blue. Hover a row to see which posts they edited and when.

Opus 4.7 23 last Apr 27
Posts touched (14)
- Lifting Auto-Research
- Convergence as a first-class eval primitive
- The ensemble and the edit
- Teaching Agents to Improve Themselves
- RL Without Gradients
- Sandboxes All the Way Down
- +8 more
Opus 4.6 11 last Mar 18
Posts touched (11)
- Teaching Agents to Improve Themselves
- RL Without Gradients
- Sandboxes All the Way Down
- Multi-Agent Orchestration with Convergence Loops
- Anatomy of an Autonomous Security Audit
- Vibecoding a Browser Agent
- +5 more

Most edited posts

Posts the loop keeps coming back to. Hover the row to see the revision timeline. Each dot is one edit, colored by model, plotted on the post's lifetime.

Recent activity

The last 20 edits the system observed. Open the full trace log →

Apr 27, 2026 Opus 4.7

captured session · 2 asst turns · 1 tool calls

Lifting Auto-Research trace → a32598c
Apr 24, 2026 Opus 4.7

initial draft — StatGrid baseline, Scorecard for audit scenario, three-layer scoring, monotonicity, resumable runs

Convergence as a first-class eval primitive
Apr 24, 2026 Opus 4.7

polish pass: Sidenote on PoC realism; closing rewritten from meta-commentary to concrete per-criterion diagnosis

Convergence as a first-class eval primitive trace →
Apr 24, 2026 Opus 4.7

diagram fix: legend moved below plot area so labels no longer cut off

Convergence as a first-class eval primitive trace →
Apr 24, 2026 Opus 4.7

polish pass: stray tex formula removed, narrator-smoothing example added showing three debugging cycles compressed into one clause

The ensemble and the edit
Apr 24, 2026 Opus 4.7

closing tightened to the concrete shift — one overnight run, six failure modes classified, four fixed, $30 of compute replacing two weeks of diagnostic work

Teaching Agents to Improve Themselves
Apr 23, 2026 Opus 4.7

10 asst turns, 9 tool calls captured

The ensemble and the edit trace → ff1e9a5
Apr 23, 2026 Opus 4.7

draft — first framing around linear extensions, rejected by operator as muddled

The ensemble and the edit
Apr 23, 2026 Opus 4.7

full rewrite — scrapped order-theory angle, reframed around shadcn-chat baseline and ensemble-vs-edit as real options

The ensemble and the edit
Apr 23, 2026 Opus 4.7

richness pass: built ChatMock component, added four rendered mockups inline (baseline / ensemble / edit / combined)

The ensemble and the edit
Apr 23, 2026 Opus 4.7

ChatMock redesigned: replaced wireframe transcript with rounded-card layout, rounded-lg tool pills with SVG icon badges, run_group collapsible containers

The ensemble and the edit
Apr 23, 2026 Opus 4.7

5 asst turns, 5 tool calls captured

Teaching Agents to Improve Themselves trace → ff1e9a5
Apr 23, 2026 Opus 4.7

composition diagram redrawn: /evolve becomes the outer container wrapping the three commands; phase track nested inside; switched palette to current B&W + action-color vocabulary

Teaching Agents to Improve Themselves
Apr 23, 2026 Opus 4.7

3 asst turns, 3 tool calls captured

RL Without Gradients trace → ff1e9a5
Apr 23, 2026 Opus 4.7

2 asst turns, 2 tool calls captured

Sandboxes All the Way Down trace → ff1e9a5
Apr 23, 2026 Opus 4.7

2 asst turns, 2 tool calls captured

Multi-Agent Orchestration with Convergence Loops trace → ff1e9a5
Apr 23, 2026 Opus 4.7

4 asst turns, 2 tool calls captured

Anatomy of an Autonomous Security Audit trace → ff1e9a5
Apr 23, 2026 Opus 4.7

2 asst turns, 2 tool calls captured

Vibecoding a Browser Agent trace → ff1e9a5
Apr 23, 2026 Opus 4.7

1 asst turns, 1 tool calls captured

Convergence in Multi-Agent Review Loops trace → ff1e9a5
Apr 23, 2026 Opus 4.7

2 asst turns, 2 tool calls captured

Building a Browser Agent That Doesn't Get Stuck trace → ff1e9a5

The rules

AI may edit

Any post without original: true in its frontmatter
Frontmatter (title, description, tags) when revising content
Components in src/components/ and layouts
The trace-capture pipeline itself

AI must not edit

Any post with original: true — hard rule, in CLAUDE.md
The list of authors on a human post
Revision logs older than the agent's session start

Methodology

How the dashboard above is built.

1

Capture

A post-commit hook checks whether any src/content/posts/*.mdx file changed. If so, the script reads the active session log (Claude Code or Codex), extracts the model, timestamp, and a one-line change note, and appends a revision entry to the post's frontmatter with a trace_id.
2

Trace store

Each captured session is written to src/content/traces/<trace_id>.json — the full message stream, tool calls, and artifacts. The traces collection is loaded by Astro's content layer at build time.
3

Reconstruction

Posts written before the capture pipeline existed have a single reconstructed: true revision back-filled from the git history. Models for those entries are best-guess, marked clearly in the UI.
4

Aggregation

This page reads every post's frontmatter at build time, joins it with the trace store, and renders the dashboard. There is no database. Everything you see here is in the repo.