Opus 4.7 claude-code
Exploit-or-Disprove: Adversarial Validation of Security Findings

captured session · 3 asst turns · 3 tool calls
Created: Mar 3, 2026
Updated: Apr 23, 2026, 11:34 PM
Turns
Tool calls
Files touched
Files

src/content/posts/exploit-or-disprove.mdx
Commit

ff1e9a5
Conversation

3 turns. Full text where captured; older traces show only the first ~280 chars.
assistant #1 11:34 PM 1 tool
- Edit
  
  (input not captured in this trace)
assistant #2 11:34 PM 1 tool
- Edit
  
  (input not captured in this trace)
assistant #3 11:34 PM 1 tool
- Edit
  
  (input not captured in this trace)
Diff

Per-file changes from ff1e9a5.
src/content/posts/exploit-or-disprove.mdx
diff --git a/src/content/posts/exploit-or-disprove.mdx b/src/content/posts/exploit-or-disprove.mdxnew file mode 100644index 0000000..6730f79--- /dev/null+++ b/src/content/posts/exploit-or-disprove.mdx@@ -0,0 +1,130 @@+---+title: 'Exploit-or-Disprove: Adversarial Validation of Security Findings'+description: 'Automated security auditing produces false positives. The fix is a second agent whose only job is to write a working exploit or downgrade the finding.'+date: 2026-03-03+tags: ['security', 'agents', 'systems']+---++import Chart from '../../components/Chart.astro'++Automated security auditing has a false positive problem. Run an LLM-based auditor over a smart contract and it'll flag 20 vulnerabilities. Maybe 6 are real. The other 14 are theoretical, impossible given the actual execution context, or outright hallucinated.++The standard fix is manual triage: a human reads each finding and decides. This doesn't scale. Here's a better approach: a second agent whose sole job is to *exploit* each high-severity finding or *prove it can't be exploited*.++## The protocol++When the primary auditor produces findings, any finding rated High or Critical triggers a validation pass:++1. A specialized agent (the "pentester") receives the finding + the codebase+2. It attempts to write a proof-of-concept exploit, a test that demonstrates the vulnerability+3. If the PoC compiles, runs, and demonstrates the claimed impact → **confirmed**+4. If after a full attempt the agent cannot produce a working PoC → **downgraded** to Medium or Low+5. Unproven findings are explicitly tagged `[unproven]` in the final report++This is adversarial by design. The auditor's incentive is to find issues. The pentester's incentive is to prove or disprove them. The final report reflects the *intersection* of both perspectives.++## Deduplication via similarity scoring++Before validation, we need to merge duplicate findings. Multiple auditor agents working in parallel will often flag the same issue in slightly different words. Naive exact matching misses these. We use a combination of Jaccard similarity on token sets and n-gram overlap:++$$+\text{sim}(a, b) = \alpha \cdot J(T_a, T_b) + (1 - \alpha) \cdot \frac{|N_a \cap N_b|}{|N_a \cup N_b|}+$$++where $T$ is the token set, $N$ is the set of character n-grams (typically $n=3$), and $\alpha = 0.5$ balances the two. Findings with $\text{sim} > 0.6$ within the same vulnerability category are merged.++The category constraint is important: a SQL injection finding and an XSS finding might share boilerplate language about "user input" and "sanitization." Without category gating, they'd incorrectly merge.++<Chart+  id="validation-sankey"+  code={`+const W = 700, H = 360+const canvas = document.createElement('canvas')+canvas.width = W; canvas.height = H+const ctx = canvas.getContext('2d')++const style = getComputedStyle(document.documentElement)+const fg = style.getPropertyValue('--fg').trim() || '#111'+const faint = style.getPropertyValue('--fg-faint').trim() || '#999'+const bg = style.getPropertyValue('--bg').trim() || '#fff'++ctx.fillStyle = bg; ctx.fillRect(0, 0, W, H)++const cols = [90, 260, 430, 610]+const top = 80++function drawBox(x, y, w, h, label, count, shade) {+  ctx.fillStyle = shade || fg+  ctx.fillRect(x, y, w, h)+  ctx.fillStyle = bg; ctx.font = 'bold 14px JetBrains Mono, monospace'+  ctx.textAlign = 'center'; ctx.textBaseline = 'middle'+  ctx.fillText(count, x + w/2, y + h/2 - 8)+  ctx.font = '10px JetBrains Mono, monospace'+  ctx.fillText(label, x + w/2, y + h/2 + 9)+}++function drawFlow(x1, y1, h1, x2, y2, h2, shade) {+  ctx.globalAlpha = 0.12+  ctx.fillStyle = shade || fg+  ctx.beginPath()+  ctx.moveTo(x1, y1); ctx.lineTo(x2, y2)+  ctx.lineTo(x2, y2 + h2); ctx.lineTo(x1, y1 + h1)+  ctx.closePath(); ctx.fill()+  ctx.globalAlpha = 1+}++// labels+ctx.fillStyle = faint; ctx.font = '12px JetBrains Mono, monospace'; ctx.textAlign = 'center'+ctx.fillText('Raw findings', cols[0], top - 25)+ctx.fillText('After dedup', cols[1], top - 25)+ctx.fillText('By severity', cols[2], top - 25)+ctx.fillText('After validation', cols[3], top - 25)++// Column 1: Raw+drawBox(cols[0] - 35, top, 70, 200, 'raw', '24', '#666')++// Column 2: Deduped+drawBox(cols[1] - 35, top + 20, 70, 160, 'deduped', '16', '#888')+drawFlow(cols[0] + 35, top, 200, cols[1] - 35, top + 20, 160, '#888')++// Column 3: By severity+drawBox(cols[2] - 35, top, 70, 45, 'critical', '3', '#333')+drawBox(cols[2] - 35, top + 55, 70, 45, 'high', '5', '#555')+drawBox(cols[2] - 35, top + 110, 70, 40, 'medium', '4', '#888')+drawBox(cols[2] - 35, top + 160, 70, 40, 'low', '4', '#bbb')+drawFlow(cols[1] + 35, top + 20, 35, cols[2] - 35, top, 45, '#333')+drawFlow(cols[1] + 35, top + 55, 45, cols[2] - 35, top + 55, 45, '#555')+drawFlow(cols[1] + 35, top + 105, 35, cols[2] - 35, top + 110, 40, '#888')+drawFlow(cols[1] + 35, top + 145, 35, cols[2] - 35, top + 160, 40, '#bbb')++// Column 4: Validated+drawBox(cols[3] - 35, top, 70, 40, 'confirmed', '4', '#333')+drawBox(cols[3] - 35, top + 50, 70, 40, 'unproven', '3', '#888')+drawBox(cols[3] - 35, top + 100, 70, 35, 'rejected', '1', '#bbb')+drawBox(cols[3] - 35, top + 150, 70, 50, 'pass-thru', '8', '#aaa')++// flows from severity to validation+drawFlow(cols[2] + 35, top, 45, cols[3] - 35, top, 40, '#333')+drawFlow(cols[2] + 35, top, 45, cols[3] - 35, top + 50, 40, '#888')+drawFlow(cols[2] + 35, top + 55, 45, cols[3] - 35, top, 40, '#333')+drawFlow(cols[2] + 35, top + 55, 45, cols[3] - 35, top + 50, 40, '#888')+drawFlow(cols[2] + 35, top + 55, 45, cols[3] - 35, top + 100, 35, '#bbb')+drawFlow(cols[2] + 35, top + 110, 90, cols[3] - 35, top + 150, 50, '#aaa')++container.appendChild(canvas)+  `}+/>++In a typical audit of a DeFi protocol: 24 raw findings → 16 after dedup → 8 high/critical sent to validation → 4 confirmed with working PoCs, 3 downgraded (unproven), 1 rejected outright. The 8 medium/low findings pass through without validation because the cost of validating everything isn't worth it.++## Why this works++The key insight is that **writing an exploit is a fundamentally different task than identifying a vulnerability**. The auditor reasons abstractly about code paths and invariants. The pentester has to make something *compile and run*. Many "vulnerabilities" that look plausible in abstract reasoning fall apart when you try to construct actual calldata that triggers them.++This is especially true for reentrancy in modern Solidity. The auditor sees a state change after an external call and flags it, but the actual contract might have a reentrancy guard, or the callback context might not allow the reentrant path. The pentester discovers this by trying and failing.++## The cost tradeoff++Validation roughly doubles the compute cost for high-severity findings. But it dramatically reduces the human triage burden. If you're producing audit reports that a human needs to act on, the difference between "20 findings, figure out which matter" and "4 confirmed with PoCs, 4 unproven, 12 low-risk" is the difference between a useful tool and noise.++For smart contract audits where a single confirmed critical finding might prevent a multi-million dollar exploit, the compute cost of validation is negligible against the cost of false positives.