v1 · client-side inference · no server

Search the web.
Answer privately.
AI runs on your device.

Webelves is a local-first search engine. Type a query — QAi fetches real web pages, finds the most relevant passages, and generates a grounded answer using an AI model running entirely in your browser. No account. No cloud AI. Nothing leaves your machine.

Open Webelves → See product overview ↗

what is the Fermi paradox?

explain dark matter

← → ↻

what is the Fermi paradox? ↵

⚙

✦ QAi

4 sources

[1]

en.wikipedia.org/wiki/Fermi_paradox

The apparent contradiction between high probability estimates for ET civilizations and the lack of evidence…

[2]

space.com/25325-fermi-paradox.html

Enrico Fermi famously asked "Where is everybody?" in 1950, estimating ET contact should have occurred…

QAi

The Fermi paradox is the tension between high-probability estimates for intelligent civilizations 1 and the complete absence of observed contact 2. First posed by Fermi in 1950, it remains unresolved.

Ask a follow-up… ↵

wllama · SmolLM2-360M-Instruct ◉ ready

How it works

Search. Fetch. Rank. Answer.

Every query goes through a four-stage pipeline — all running in your browser tab, all without a server.

[01]

SearXNGAdapter queries your local SearXNG instance. Pluggable — swap in any SearchAdapter implementation. No tracking, no personalization.

[02]

Fetch & extract

Top 4 result pages fetched in-browser. @mozilla/readability strips chrome, extracts clean article text. Chunked into passage windows.

[03]

Rank

MiniLM embeddings via @xenova/transformers. Cosine similarity ranks passages against the query. Top 5 chunks forwarded to the LLM context.

[04]

Answer

wllama (llama.cpp WASM+SIMD) streams a grounded answer. Prompted to cite sources as [N] — citations are interactive and scroll to the source card.

Why Webelves

One engine. Every tab.

Built around the constraint that loading a GGUF model once and sharing it across sessions is both faster and more honest than pretending each page is independent.

⬡

Shared wllama worker

One wllama Worker per page load. All browser tabs share it via runExclusive — no duplicate model loads, no RAM duplication. Pending tabs show a "waiting" badge and queue cleanly. One model download, many parallel searches.

wllama · runExclusive · OPFS cache

◈

Grounded answers, real citations

The AI only sees passages from the actual pages it fetched — no making things up. Every claim is cited inline as [N]. Click a citation — the source card highlights and scrolls into view. Every answer traces back to a real URL you can inspect.

RAG · MiniLM · structured citations

◉

Transparent memory

After each chat, the model extracts 0–3 durable facts and writes them to memories.json in OPFS. The full list is visible, editable, and wipeable in Settings — no hidden context, no surprise personalization, no cloud profile.

OPFS · visible · user-controlled

⬡

Pluggable search backend

SearchAdapter interface ships with SearXNGAdapter. CORS-proxy and browser-extension adapters are planned for v1.5. Self-host SearXNG, point Webelves at it, keep every query off third-party search infrastructure.

SearXNG · extensible · self-hosted

Runtime

GGUF in the browser. No GPU.

wllama runs llama.cpp compiled to WASM with SIMD extensions. Any quantized GGUF model works. Start with a small, fast model — upgrade when your hardware supports it.

[Q4]

SmolLM2-360M · Default

The standard model across webelves. Fast first-token on most machines; downloads once, then runs offline from OPFS.

[Q4]

Qwen2.5-0.5B · optional

Optional alternative. Swap it in from Settings if you prefer it. Same wllama runtime.

[Q4]

Gemma-3-270M · optional

Light-tier variant. Add your own GGUF via Settings — any HuggingFace GGUF URL works.

[Q4]

Any GGUF · custom URL

Point Webelves at any HuggingFace-hosted GGUF URL in Settings. One-time OPFS download, cached locally forever.

Keyboard

Browser-native shortcuts.

Webelves behaves like a browser. Familiar shortcuts work as expected.

Shortcut	Action
Ctrl + T	New search tab
Ctrl + W	Close active tab
Ctrl + J	Toggle QAi panel
Ctrl + ,	Open Settings
Ctrl + 1–9	Switch to tab N

Get started

One click. No install.

Open Webelves in any modern browser. Set your SearXNG URL in Settings. The model downloads once to OPFS — cached locally, available offline from then on.

Open Webelves → Product overview ↗

# self-host SearXNG (required for search)

docker run -d --name searxng -p 8080:8080 searxng/searxng

# then point Webelves at: http://localhost:8080

Search the web.Answer privately.AI runs on your device.

Search. Fetch. Rank. Answer.

One engine. Every tab.

GGUF in the browser. No GPU.

Browser-native shortcuts.

One click. No install.

Search the web.
Answer privately.
AI runs on your device.