A deterministic repair engine for AI-generated Gutenberg markup.
The Problem
As I began integrating Large Language Models (LLMs) into WordPress content workflows, I hit a recurring friction point: Hallucinated Syntax.
While models like ChatGPT 5 or Claude are excellent at writing content, they are notoriously bad at adhering to the strict JSON-comment syntax required by the WordPress Block Editor (Gutenberg). Common failures include:
- Malformed JSON in block attributes (missing braces, wrong quotes).
- Non-canonical delimiters (using
/improperly in self-closing blocks). - “Stray” closing tags or unclosed block comments.
When you paste this “dirty” code into WordPress, the editor crashes or triggers the dreaded “This block contains unexpected or invalid content” recovery mode.
The Solution
I built Blocks Cleaner, a client-side utility that acts as a middleware between the AI and the WordPress Editor. It parses raw, potentially broken text, identifies syntax violations, and reconstructs a valid Block Tree before the user ever copies it.

How It Works (The Engineering)
Instead of relying on simple Regex find/replace (which is fragile for nested recursive structures), I took a hybrid approach:
- Heuristic Pre-processing:I wrote a custom tokenizer that scans for common “LLMisms”—like “ (invalid self-closing) or malformed JSON objects inside comments. It attempts JSON.parse on attribute strings and, if that fails, runs a “loose JSON” normalizer to fix quote errors.
- Official Parser Integration:The core of the application utilizes the actual @wordpress/block-serialization-spec-parser. I extracted this package from the WordPress core, bundled it using Webpack/Browserify to remove Node.js dependencies, and exposed it to the browser.
- Why this matters: By using the official parser, I ensure 100% compatibility with how WordPress actually reads content.
- Recursive Serialization:Once the block tree is successfully parsed and patched, the tool re-serializes the tree back into valid HTML comments, ensuring that every delimiter, attribute, and closing tag is perfectly strictly formatted.
Key Features
- Auto-Close Detection: Automatically detects and closes unclosed block tags based on the stack depth.
- JSON Linter: repairs broken JSON syntax within block comments (e.g., converting single quotes to double quotes).
- Visual Diffing: Provides a “Show Highlighted Changes” view so the user understands exactly what was fixed.
- Client-Side Privacy: The entire parsing logic runs in the browser via a bundled JS file; no data is sent to a server.
What I Learned
Building this highlighted the importance of Neuro-Symbolic AI systems. LLMs are probabilistic engines—they guess the next token. Parsers are symbolic engines—they enforce strict rules. To build reliable AI workflows, we cannot just “prompt better”; we must build symbolic guardrails (like this parser) that sanitize AI output before it touches production systems.
Technical Stack
- Core: JavaScript (ES6+)
- Parsing:
@wordpress/block-serialization-spec-parser(ported to browser) - Bundling: Webpack/Browserify
- UI: Tailwind CSS
