Part 3 of 7|April 2026|5 min

Teaching AI to See Its Own Mistakes

VisionQuality
Scene Critique
Detected Issues
Quality Score: 72%
๐ŸšจLamp floating 2.3m above floor
๐Ÿ“Table oversized (3.2ร— expected)
โš ๏ธChairs overlapping table

Fix Schema

type Fix =
  | { type: "move";    nodeId: string; reason: string }
  | { type: "scale";   nodeId: string; scale: number; reason: string }
  | { type: "recolor"; nodeId: string; color: string; reason: string }

The scene looked wrong and the AI didn't know.

30% of composed scenes have visible errors. The AI only has JSON โ€” it can't see. A floating lamp. An oversized table. Parts clipping through each other.


The loop

Vision Critique Loop
โฌก
Compose
โ†’
โ–ฃ
Render to Canvas
โ†’
โŠก
Screenshot (PNG)
โ†’
โ—ˆ
Vision Model
z.array(Fix)
โ†’
โœฆ
Structured Fixes
$0.015 per run
ยท2โ€“3 secondsยทautomatic

Compose โ†’ Canvas โ†’ Screenshot โ†’ Vision Model โ†’ Structured Fixes.


The fix schema

type Fix =
  | { type: "move"; nodeId: string; position: [x: number, y: number, z: number]; reason: string }
  | { type: "scale"; nodeId: string; scale: number; reason: string }
  | { type: "recolor"; nodeId: string; color: string; reason: string }
  | { type: "remove"; nodeId: string; reason: string }
  | { type: "add"; name: string; geometry: string; position: [x, y, z]; reason: string }

Debuggable. You see exactly what the vision model thought was wrong.


$0.015

13-point quality jump (72% โ†’ 85%). 2โ€“3 seconds. Runs automatically after compose.


The field is converging.

CMU's CADSmith (March 2026) uses a similar loop for single parts. Single parts are the first 5% โ€” we're assembling entire machines with streaming, parallel fabrication, and collaborative editing on top.