Engineering-led features

Features that make it the fastest AI prompt enhancer.

Every feature on this page is backed by code shipping today — in the extension and in our routing layer. Skim the cards, or open the deep-dive for the algorithm details.

Context-aware refinement

DOM-scraped thread context · last 6 turns · 8k cap

The only prompt enhancer that reads your ongoing chat. Before a rewrite, the extension scans each platform's conversation DOM, extracts up to the last 6 turns (user + assistant), caps the total at 8,000 characters, and ships that context to the optimizer. The result: a vague mid-thread follow-up like "make it shorter" becomes a precise rewrite that actually references what you were just talking about.

Context
ts
const ctx = getConversationContext();
// up to 6 turns, capped at 2000 chars each
// attached to the optimize request for the LLM

Real-time auto coaching

Debounced 1.3 s watcher, 15-char gate

A silent watcher sits on the chat input of each supported platform. 1.3 seconds after you stop typing, and only if your prompt is at least 15 characters, the extension asks for a score. No keystroke leaves the page without a human pause.

Watcher
ts
if (trimmedLength < 15) return;
scheduleWith(DEBOUNCE_MS /* 1300 */, () => classify(prompt));

LRU cache, 5-minute TTL

120 entries on the client, <5 ms hits

Classifications are keyed by a normalised hash of the prompt. A 120-entry LRU cache with a 5-minute TTL means repeated prompts — and they are very common — return almost instantly.

Cache
ts
const key = hash(normalize(prompt));
const hit = cache.get(key);
if (hit && fresh(hit)) return hit.value;

Multi-AI model routing

Groq · Gemini · Hugging Face across 3 tiers

Our routing layer keeps a registry of models across tier-1, tier-2 and tier-3. A complexity score derived from your prompt length, code-likeness and line count picks the minimum tier that can answer well, then sends to the fastest model in that tier.

Router
ts
const complexity = score(prompt);
const minTier = complexity <= 2 ? 1 : complexity <= 3 ? 2 : 3;
const model = pickFastestHealthy(registry, minTier);

Health-aware fallback chain

Sliding-window success + cooldown

Every model call is recorded in a sliding window. When a provider starts failing or gets rate-limited, the router skips it for a cooldown period and tries the next healthy fallback — so one bad provider never takes the whole extension down.

Health
ts
if (health(model).onCooldown) continue;
try { return await call(model); }
catch (e) { mark(model, e); }

Follow-up Q&A optimizer

Intent-aware clarifying questions

A lightweight heuristic detects whether your prompt is about coding, writing, image, video, audio, research or analysis. The optimizer then asks the 2–3 clarifying questions that actually move the needle for that intent before rewriting.

Intent
ts
const intent = detectIntent(prompt);
const questions = questionsFor(intent);
return ask(questions).then(rewrite);
Live scoring

Watch the three signals score in real time.

This is the exact coaching loop the extension runs on every keystroke — with each prompt rewrite, the bars for specificity, context and format clarity settle on new targets.

chat.ai / prompt maturity
Vague · 1.0/5

Weak on all three — no topic, no audience, no output shape.

1.0/5

Specificity

Does the prompt name the exact language, framework, audience or constraint?

1.0/5

Context richness

Does it share enough background — inputs, prior code, edge cases — to do its best work?

1.0/5

Output format

Have you told the model what shape of answer you want — code, JSON, table?

Architecture

Three providers, one router, zero downtime.

Our routing layer keeps a registry of models across Groq, Gemini and Hugging Face, each with its own tier, timeout and retry policy. The router picks the right tier for the prompt, the health tracker skips the unhealthy, and the fallback chain keeps going until one model answers.

  • Groq — tier 1 & 2 low-latency models
  • Google Gemini — tier 2 & 3 reasoning models
  • Hugging Face Inference — tier-2/3 fallback
  • Per-model timeouts, cooldowns and retries
Read the full pipeline deep-dive
PromptRouterModel
ShortT1
MediumT2
Long & code-heavyT3
router.pick()
GroqACTIVE
GeminiOK
Hugging FaceOK
3/3 healthyShort · T1Groq·412 ms
Free forever

Every feature, on every prompt you type.

Install the extension once, and all six features follow you across ChatGPT, Claude, Gemini and Perplexity.