Early in my journey, I fell into the “Mega-Prompt” trap. I tried to force an LLM to read a blog post, understand its sentiment, and generate a pixel-perfect DALL-E prompt in a single, zero-shot request.
Looking at my old prompts makes me smile.
It failed 80% of the time. The AI would either hallucinate the blog details or ignore my style guidelines entirely.
The fix wasn’t writing a “better” prompt. The fix was changing the architecture. I moved from Chain of Thought (asking one model to think step-by-step) to Chain of Execution (forcing the process into distinct, isolated stages).
The Concept
To solve this, I treated the AI not as a solo genius, but as a production studio. I decomposed the task into three distinct agents, each with a narrow definition of done.
1. Agent A: The Director (Analysis)
- Input: The raw blog post text.
- Goal: Extract the core subject, mood, and abstract concepts.
- Constraint: It is forbidden from writing image prompts. It outputs raw JSON data only.
- Output:
{"subject": "server optimization", "mood": "calm, technical", "metaphor": "a tidy warehouse"}
2. Agent B: The Artist (Synthesis)
- Input: The Director’s JSON output + My “NoumanLabs” Style Guide.
- Goal: Translate the data into a strictly formatted Midjourney/DALL-E prompt.
- Constraint: It must adhere to my color palette and “flat vector” style rules.
- Output:
Flat vector illustration of a tidy warehouse, server racks, olive green and cream palette, minimal style --ar 16:9
3. Agent C: The Critic (Verification – Optional)
- Input: The Artist’s prompt.
- Goal: Check for banned words or conflicting styles before spending API credits on generation.
The Key Learning
Reliability doesn’t come from a smarter model; it comes from a narrower scope.
By decomposing the task, I reduced the “cognitive load” on each agent. The Director doesn’t need to know about colors, and The Artist doesn’t need to read the blog post.
Result: My success rate went from 20% to 85%, and the visual consistency across the blog is now automatic.
