v1 · client-side inference · no server

Search the web.
Answer privately.
AI runs on your device.

Webelves is a local-first search engine. Type a query — QAi fetches real web pages, finds the most relevant passages, and generates a grounded answer using an AI model running entirely in your browser. No account. No cloud AI. Nothing leaves your machine.

what is the Fermi paradox?
explain dark matter
+
← → ↻
what is the Fermi paradox?
✦ QAi
4 sources
[1]
en.wikipedia.org/wiki/Fermi_paradox
The apparent contradiction between high probability estimates for ET civilizations and the lack of evidence…
[2]
space.com/25325-fermi-paradox.html
Enrico Fermi famously asked "Where is everybody?" in 1950, estimating ET contact should have occurred…
QAi
The Fermi paradox is the tension between high-probability estimates for intelligent civilizations 1 and the complete absence of observed contact 2. First posed by Fermi in 1950, it remains unresolved.
Ask a follow-up…
wllama · SmolLM2-360M-Instruct ◉ ready
◉ AI runs on your device ✗ No cloud AI service ✗ No cloud queries ✓ Your data stays local ✓ Self-hosted search

Search. Fetch. Rank. Answer.

Every query goes through a four-stage pipeline — all running in your browser tab, all without a server.

[01]
Search
SearXNGAdapter queries your local SearXNG instance. Pluggable — swap in any SearchAdapter implementation. No tracking, no personalization.
[02]
Fetch & extract
Top 4 result pages fetched in-browser. @mozilla/readability strips chrome, extracts clean article text. Chunked into passage windows.
[03]
Rank
MiniLM embeddings via @xenova/transformers. Cosine similarity ranks passages against the query. Top 5 chunks forwarded to the LLM context.
[04]
Answer
wllama (llama.cpp WASM+SIMD) streams a grounded answer. Prompted to cite sources as [N] — citations are interactive and scroll to the source card.

One engine. Every tab.

Built around the constraint that loading a GGUF model once and sharing it across sessions is both faster and more honest than pretending each page is independent.

Shared wllama worker
One wllama Worker per page load. All browser tabs share it via runExclusive — no duplicate model loads, no RAM duplication. Pending tabs show a "waiting" badge and queue cleanly. One model download, many parallel searches.
wllama · runExclusive · OPFS cache
Grounded answers, real citations
The AI only sees passages from the actual pages it fetched — no making things up. Every claim is cited inline as [N]. Click a citation — the source card highlights and scrolls into view. Every answer traces back to a real URL you can inspect.
RAG · MiniLM · structured citations
Transparent memory
After each chat, the model extracts 0–3 durable facts and writes them to memories.json in OPFS. The full list is visible, editable, and wipeable in Settings — no hidden context, no surprise personalization, no cloud profile.
OPFS · visible · user-controlled
Pluggable search backend
SearchAdapter interface ships with SearXNGAdapter. CORS-proxy and browser-extension adapters are planned for v1.5. Self-host SearXNG, point Webelves at it, keep every query off third-party search infrastructure.
SearXNG · extensible · self-hosted

GGUF in the browser. No GPU.

wllama runs llama.cpp compiled to WASM with SIMD extensions. Any quantized GGUF model works. Start with a small, fast model — upgrade when your hardware supports it.

[Q4]
SmolLM2-360M · Default
The standard model across webelves. Fast first-token on most machines; downloads once, then runs offline from OPFS.
[Q4]
Qwen2.5-0.5B · optional
Optional alternative. Swap it in from Settings if you prefer it. Same wllama runtime.
[Q4]
Gemma-3-270M · optional
Light-tier variant. Add your own GGUF via Settings — any HuggingFace GGUF URL works.
[Q4]
Any GGUF · custom URL
Point Webelves at any HuggingFace-hosted GGUF URL in Settings. One-time OPFS download, cached locally forever.

Browser-native shortcuts.

Webelves behaves like a browser. Familiar shortcuts work as expected.

Shortcut Action
Ctrl + T New search tab
Ctrl + W Close active tab
Ctrl + J Toggle QAi panel
Ctrl + , Open Settings
Ctrl + 1–9 Switch to tab N

One click. No install.

Open Webelves in any modern browser. Set your SearXNG URL in Settings. The model downloads once to OPFS — cached locally, available offline from then on.

Open Webelves → Product overview ↗
# self-host SearXNG (required for search)
docker run -d --name searxng -p 8080:8080 searxng/searxng
# then point Webelves at: http://localhost:8080