How the AI prompt enhancer works, end to end.
From the keystroke you just pressed to the rewritten prompt on your screen is about two seconds. This page unpacks the eight steps in between, the router that picks the model, and the three dimensions that define quality.
Eight steps from keystroke to coach card.
Watch the active step walk the chain below — each card lights up as the mock request passes through it.
- 01
Watcher starts on supported chats
runningWhen you open ChatGPT, Claude, Gemini or Perplexity, a content script finds the chat input and attaches a silent watcher. The watcher does nothing until you actually type — it just holds a reference.
- 02
1.3 s debounce
Every keystroke resets a debounce timer. Only after 1,300 ms of silence does the extension consider running. This is calibrated to match how people actually write prompts: you type a thought, pause, re-read, then continue.
- 03
15-character gate
If the prompt is under 15 characters, we skip the round-trip entirely. There is nothing useful a classifier can say about "hi" or "ok".
- 04
Duplicate de-duplication
Before sending, we hash the prompt and compare against the last one we processed. Identical prompts never go to the network a second time.
- 05
Abort previous in-flight request
If you kept typing while an earlier classify was running, the extension aborts it in the background. A generation counter guarantees that only the latest response can update the UI — no flicker, no stale scores.
- 06
LRU cache lookup
The extension keeps a 120-entry LRU cache in the background with a 5-minute TTL. Cache hits resolve in well under 5 ms, which is why repeat prompts feel instant.
- 07
Classify request to our API
On a miss, the extension sends the prompt to our API. Identical in-flight requests are deduplicated there too, so multiple tabs asking the same question share the same underlying model call.
- 08
Router picks the model, UI renders
Our routing layer picks the right tier, the right healthy provider, and returns category + 1–5 scores for each dimension + a rewritten prompt. The coach dot opens into a glass card and you can apply the rewrite in one click.
Complexity becomes a tier. A tier becomes a model.
Every prompt gets three cheap signals measured before the model is even picked: length, code-likeness and line count. They combine into a complexity score that maps directly to a minimum model tier.
| Signal | Low · +0 | Mid · +1 | High · +2 |
|---|---|---|---|
| Prompt length (chars) | ≤ 80 | 81–400 | > 400 |
| Code-like regex hits | None | 1–2 | 3+ |
| Line count | 1–3 | 4–10 | > 10 |
Complexity ≤ 2
Short, plain prompts. Routed to the fastest small models on Groq. Typical round-trip 400–800 ms.
Complexity ≤ 3
Medium prompts with some structure. Routed to mid-sized Gemini or Groq models with stronger reasoning.
Complexity > 3
Long, code-heavy or multi-part prompts. Routed to the larger models on Gemini or Hugging Face.
Eight intent buckets. Each gets a different rewrite plan.
Before rewriting, the optimizer checks which task bucket your prompt belongs in. The bucket decides which clarifying questions are actually worth asking.
| Intent | Heuristic signals | Rewrite plan |
|---|---|---|
| coding | function, class, error, stack trace, fix this | Ask about language, runtime, input shape, expected output. |
| writing | write, draft, rewrite, tone, audience | Ask about length, tone, audience, format. |
| image | image, illustration, photo, midjourney, flux | Ask about aspect ratio, style, lighting, subject. |
| video | video, clip, scene, runway, sora | Ask about duration, camera, motion, setting. |
| audio | song, music, voice, tts, jingle | Ask about genre, mood, duration, voice type. |
| research | compare, sources, cite, literature | Ask about depth, citations, recency, format. |
| analysis | analyse, evaluate, breakdown, metric | Ask about data source, axes, decision context. |
| general | (fallback) | Ask about audience and desired output format. |
Health-aware, ordered, silent.
The router tries the primary model first. If it times out, returns an error, or is flagged unhealthy by the sliding-window tracker, it is skipped for a short cooldown and the next fallback in the same tier takes over. The caller sees one answer and never hears about the retries.
- Sliding-window success + failure counters per model.
- Cooldown on rate-limit (429) and repeated 5xx.
- Per-model timeout enforced by the router, not the provider.
- groq/llama-8bT2 · 620 msidle
- groq/mixtral-8x7bT2 · 780 msidle
- gemini/flash-1.5T2 · 940 msidle
- hf/mistral-7bT3 · 1420 msidle
What we actually score.
The overall 1–5 maturity score is a weighted blend of three sub-scores. You see the blend in the coach dot; you see the components in the details card. The live demo below cycles through a vague → refined → sharp rewrite so you can watch the per-dimension bars move in real time.
Weak on all three — no topic, no audience, no output shape.
Specificity
Does the prompt name the exact language, framework, audience or constraint?
Context richness
Does it share enough background — inputs, prior code, edge cases — to do its best work?
Output format
Have you told the model what shape of answer you want — code, JSON, table?
Three caches, one router, zero wasted traffic.
Cache hit
< 0 ms
Identical prompts from the last five minutes never touch the network.
Tier-1 models
~ 0–800 ms
Short and mid prompts on the fastest Groq models. You barely see the spinner.
Tier-3 models
~ 0–3 s
Reserved for long, code-heavy prompts where quality matters more than speed.
Latency numbers are typical ranges across recent production requests; exact values depend on provider and region.
Deep-dive answers.
See it fire on your own prompt.
The live demo runs the exact same classify API the extension calls. Paste a prompt and watch the pipeline in action.