
Hello, I'm Maneshwar. I'm building git-lrc, a Micro AI code reviewer that runs on every commit. It is free and source-available on Github. Star git-lrc to help devs discover the project. Do give it a try and share your feedback for improving the project.
You might not have noticed, but Chrome quietly started shipping a local AI model called Gemini Nano bundled right into the browser.
No API keys. No cloud round-trips. No per-token cost. It just runs on your machine.
The interface to talk to it is called the Prompt API, and it landed in Chrome 138.
I spent some time going through the full API surface and built a playground that lets you experiment with every feature session management, streaming, structured output, multimodal input, response prefixing, and more in one page.
This post walks you through all of it.
Why does this matter?
On-device AI flips the usual tradeoffs:
- Free at runtime — the model runs on the user's hardware, not your servers
- Private by default — no data leaves the device once the model is downloaded
- Works offline — after the initial download, no network required
- Low latency — no round-trip to a data centre
The catch is that Gemini Nano is a small model.
It's great for classification, summarization, Q&A on focused content, and structured extraction.
It won't replace GPT-4 for complex reasoning.
Think of it as a smart, free, always-available layer you can add on top of your existing product.
Enabling the API
The Prompt API isn't on by default in all Chrome builds. Enable two flags:
Step 1 — Go to chrome://flags/#optimization-guide-on-device-model and set it to Enabled BypassPerfRequirement.
Step 2 — Go to chrome://flags/#prompt-api-for-gemini-nano and enable both the base API and the multimodal option.
Relaunch Chrome.
Then visit chrome://on-device-internals to check the model download status. First use will trigger a download — Gemini Nano is a few gigabytes.
The Playground
I put together a single-file HTML playground that covers the entire API surface.
Clone it and open playground.html directly in Chrome no build step, no server.
git clone https://github.com/lovestaco/gemini-brow
Then open playground.html in Chrome 138+.
Session Setup and Context Window
Everything starts with a session.
You create one with LanguageModel.create(), optionally passing a system prompt and expected input/output modalities.
const session = await LanguageModel.create({
initialPrompts: [
{ role: 'system', content: 'You are a helpful and friendly assistant.' }
],
expectedInputs: [{ type: 'text', languages: ['en'] }],
expectedOutputs: [{ type: 'text', languages: ['en'] }],
});
Always call LanguageModel.availability() with the same options you'll pass to create() before creating a session the model may not support certain modalities on every device.
const avail = await LanguageModel.availability({
expectedInputs: [{ type: 'text', languages: ['en'] }],
expectedOutputs: [{ type: 'text', languages: ['en'] }],
});
// 'available' | 'downloadable' | 'downloading' | 'unavailable'
Each session has a context window a token budget that tracks everything in the conversation.
When it fills up, the oldest prompt/response pairs are dropped (but the system prompt is never dropped).
You can monitor it:
console.log(`${session.contextUsage} / ${session.contextWindow} tokens used`);
session.addEventListener('contextoverflow', () => {
console.log('Oldest turns are being dropped to make room');
});
The playground shows a live progress bar for context usage, and a warning badge on overflow.
You can also clone a session to fork the conversation at a point in time — the clone is fully independent and won't see future messages sent to the original.
const forkedSession = await session.clone();
And destroy it when you're done to free resources:
session.destroy();
Prompting the Model
Request-based (wait for the full response)
Use prompt() when you want the complete output before rendering:
const result = await session.prompt('Write me a short haiku about coffee.');
console.log(result);
Pass an AbortController signal to add a stop button:
const controller = new AbortController();
stopBtn.onclick = () => controller.abort();
const result = await session.prompt('Write me a poem!', {
signal: controller.signal,
});
Streaming (show output as it generates)
Use promptStreaming() for longer responses.
It returns a ReadableStream where each chunk is a delta the new tokens only.
Accumulate them yourself:
const stream = session.promptStreaming('Explain how a browser renders a web page.');
let fullText = '';
for await (const chunk of stream) {
fullText += chunk;
outputEl.textContent = fullText;
}
This is the right pattern, don't replace the display with each raw chunk or you'll get flickering (each chunk is only a word or two).
Append Messages
session.append() lets you pre-load context into the session without triggering a response. This is useful when you want the model to process heavy inputs (like images) while the user is still typing their question.
// Pre-load context
await session.append([{
role: 'user',
content: 'Here is the document you will answer questions about: ...'
}]);
// Later, ask the question
const answer = await session.prompt('What are the key takeaways?');
The promise from append() resolves once the input has been processed and is ready in the session's context.
Image Input (Multimodal)
If your device has a GPU with more than 4 GB VRAM, the model can process images.
Image input needs its own session you must declare { type: 'image' } in expectedInputs at creation time, and check availability separately since not all devices support it.
const imageAvail = await LanguageModel.availability({
expectedInputs: [{ type: 'text', languages: ['en'] }, { type: 'image' }],
expectedOutputs: [{ type: 'text', languages: ['en'] }],
});
if (imageAvail === 'unavailable') {
// GPU requirement not met
return;
}
const imageSession = await LanguageModel.create({
expectedInputs: [{ type: 'text', languages: ['en'] }, { type: 'image' }],
expectedOutputs: [{ type: 'text', languages: ['en'] }],
});
const imageBlob = await fetch('photo.jpg').then(r => r.blob());
const result = await imageSession.prompt([{
role: 'user',
content: [
{ type: 'text', value: 'What is in this image?' },
{ type: 'image', value: imageBlob },
],
}]);
The API accepts Blob, HTMLImageElement, HTMLCanvasElement, ImageBitmap, ImageData, and more. In the playground, you can upload any image file and ask the model about it.
Response Prefix — Force an Output Format
One of my favourite features.
You can prefill the start of the assistant's response by passing an assistant-role message with prefix: true.
The model is forced to continue from that prefix.
This is a clean way to lock in an output format without relying on instruction-following:
// Force the model to start its response with ```
{% endraw %}
toml
const result = await session.prompt([
{
role: 'user',
content: 'Create a character sheet for a gnome barbarian.',
},
{
role: 'assistant',
content: '
{% raw %}
```toml\n',
prefix: true,
},
]);
// result continues from the prefix: ```
{% endraw %}
toml\n[character]\nname = "...
{% raw %}
In the playground, you can set any prefix string and watch the model continue from exactly that point.
Boolean Classification
Need a fast yes/no answer? Pass { type: 'boolean' } as the responseConstraint and you'll always get back a raw true or false — no parsing, no prompt engineering around output format:
js
const raw = await session.prompt(
`Is this post about pottery?\n\n"${text}"`,
{ responseConstraint: { type: 'boolean' } }
);
const result = JSON.parse(raw); // true or false
This is great for content moderation, topic detection, or gating features based on page content.
Structured Output with JSON Schema
The full power of responseConstraint is a complete JSON Schema.
The model is constrained to produce valid JSON that matches your schema no hallucinated keys, no wrong types.
js
const schema = {
type: 'object',
properties: {
sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
score: { type: 'number', minimum: 0, maximum: 10 },
summary: { type: 'string' },
},
required: ['sentiment', 'score', 'summary'],
};
const raw = await session.prompt(
`Analyze the sentiment of this review:\n\n"${reviewText}"`,
{ responseConstraint: schema }
);
const data = JSON.parse(raw);
// { sentiment: 'positive', score: 8.5, summary: '...' }
Note: the schema itself uses some tokens from your context window. You can measure how many with
session.measureContextUsage({ responseConstraint: schema }).
Putting it all together
Here's how these features combine in a real use case.
Say you're building a Chrome Extension that summarises product reviews on any e-commerce page:
js
// 1. Check availability
const avail = await LanguageModel.availability({
expectedInputs: [{ type: 'text', languages: ['en'] }],
expectedOutputs: [{ type: 'text', languages: ['en'] }],
});
if (avail === 'unavailable') return;
// 2. Create a session with context
const session = await LanguageModel.create({
initialPrompts: [{
role: 'system',
content: 'You analyse product reviews and extract structured insights.',
}],
expectedInputs: [{ type: 'text', languages: ['en'] }],
expectedOutputs: [{ type: 'text', languages: ['en'] }],
});
// 3. Pre-load the reviews while user looks at the page
await session.append([{
role: 'user',
content: `Here are the reviews:\n\n${scrapedReviews}`,
}]);
// 4. Get structured output
const schema = {
type: 'object',
properties: {
verdict: { type: 'string', enum: ['buy', 'skip', 'depends'] },
pros: { type: 'array', items: { type: 'string' } },
cons: { type: 'array', items: { type: 'string' } },
summary: { type: 'string' },
},
required: ['verdict', 'pros', 'cons', 'summary'],
};
const result = JSON.parse(
await session.prompt('Summarise these reviews.', { responseConstraint: schema })
);
Zero API cost.
Runs entirely on the user's machine.
Works offline after first load.
Try the playground
The full playground is one HTML file no dependencies, no build step:
github.com/lovestaco/gemini-brow
Clone it, open playground.html in Chrome 138+, enable the flags above, and every feature in this post is wired up and ready to experiment with.
The Prompt API is still evolving, language support is limited (en, ja, es for now), mobile isn't supported yet, and the model is small.
But the fundamentals are solid and the use cases where it shines classification, summarization, extraction, Q&A on focused content are genuinely useful without touching your server budget.
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*
Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:
HexmosTech
/
git-lrc
Free, Micro AI Code Reviews That Run on Commit
| 🇩🇰 Dansk | 🇪🇸 Español | 🇮🇷 Farsi | 🇫🇮 Suomi | 🇯🇵 日本語 | 🇳🇴 Norsk | 🇵🇹 Português | 🇷🇺 Русский | 🇦🇱 Shqip | 🇨🇳 中文 |
git-lrc
Free, Micro AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
git-lrc-intro-60s.mp4See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a…
United States
NORTH AMERICA
Related News
Every Medium Publication That Accepts 3D Content (2026 Map)
14h ago

Agentic Ops: How I Shipped My Vibe-Coded Game to Production
14h ago
I build a project calculator web app for n8n / automation folks
14h ago
Integers and Floating-Point Numbers in C++
14h ago

How to Secure Azure Storage Using Managed Identities and RBAC
14h ago













