Part 3 of 7|April 2026|5 min
Teaching AI to See Its Own Mistakes
VisionQuality
Scene Critique
Detected Issues
Quality Score: 72%
๐จLamp floating 2.3m above floor
๐Table oversized (3.2ร expected)
โ ๏ธChairs overlapping table
Fix Schema
type Fix =
| { type: "move"; nodeId: string; reason: string }
| { type: "scale"; nodeId: string; scale: number; reason: string }
| { type: "recolor"; nodeId: string; color: string; reason: string }The scene looked wrong and the AI didn't know.
30% of composed scenes have visible errors. The AI only has JSON โ it can't see. A floating lamp. An oversized table. Parts clipping through each other.
The loop
Vision Critique Loop
โฌก
Compose
โ
โฃ
Render to Canvas
โ
โก
Screenshot (PNG)
โ
โ
Vision Model
z.array(Fix)
โ
โฆ
Structured Fixes
$0.015 per run
ยท2โ3 secondsยทautomaticCompose โ Canvas โ Screenshot โ Vision Model โ Structured Fixes.
The fix schema
type Fix =
| { type: "move"; nodeId: string; position: [x: number, y: number, z: number]; reason: string }
| { type: "scale"; nodeId: string; scale: number; reason: string }
| { type: "recolor"; nodeId: string; color: string; reason: string }
| { type: "remove"; nodeId: string; reason: string }
| { type: "add"; name: string; geometry: string; position: [x, y, z]; reason: string }Debuggable. You see exactly what the vision model thought was wrong.
$0.015
13-point quality jump (72% โ 85%). 2โ3 seconds. Runs automatically after compose.
The field is converging.
CMU's CADSmith (March 2026) uses a similar loop for single parts. Single parts are the first 5% โ we're assembling entire machines with streaming, parallel fabrication, and collaborative editing on top.