🌸 Inspired by Petals & BitTorrent
OpenHydra splits big language models across volunteer laptops — BitTorrent-style. Your machine serves a slice of Qwen 3.5. You earn HYDRA tokens. Nobody needs a credit card. No cloud required.
Did you buy a stack of M4 Mac Minis to run local models? Welcome home. OpenHydra’s architecture is explicitly designed to pool Apple Silicon’s Unified Memory across the internet. Leave your Mac running in the background, seed the swarm, and let your hardware earn HYDRA credits while you sleep.
The five-year-old version
Glad you asked. It’s honestly not that complicated once you stop calling it “AI infrastructure”.
A 70-billion parameter model weighs ~140 GB. That’s not fitting on your laptop. But split across 8 laptops? Now we’re talking.
Like BitTorrent, everyone in the herd serves a shard. Your laptop handles one piece of the inference. Together the whole model runs.
Your node earns barter credits and HYDRA tokens for every request it serves. These are in-network credits — not crypto, not fiat — redeemable for inference on the swarm. A mystery-shopper bot checks quality. Good llamas get priority routing. Cheaters get slashed.
Small models like Qwen 3.5 0.8B run on a single laptop. Bigger ones like Qwen 3 72B need 8 peers. The default install gets you going with Qwen 3.5 immediately — no beefy GPU required.
How it works
It’s three commands. Even your grandma could do it. (We’re not sure why your grandma would want distributed AI, but we respect the ambition.)
Clone, virtualenv, compile protobufs. Standard Python setup. You’ve done worse.
One command. Your node automatically connects to bootstrap nodes on three continents (EU, US, AP) via the Hivemind Kademlia DHT. The default model is Qwen 3.5 0.8B — lightweight enough to run on a potato, smart enough to actually be useful.
Use the OpenAI-compatible API, earn HYDRA tokens for every request your node serves, and quietly feel good about contributing to decentralised AI.
What you get
We also have a proper features page in the docs, but here’s the version where we’re allowed to be slightly smug.
Change one URL. Your existing code works. /v1/chat/completions with SSE streaming, plus Ollama-compatible /api/chat for Open WebUI and Continue.dev.
4-phase Attention Matching keeps long conversations alive without nuking your VRAM. Based on arXiv:2602.16284 — we read the papers so you don’t have to.
HTTP DHT + Hivemind Kademlia across three continents. Auto-join on startup. If one bootstrap goes down, the llamas find another way. No single point of failure.
Tauri v2 app for macOS, Windows, and Linux. Click “Start Node”. Watch credits accumulate. Currently CLI-only — the desktop GUI is in active development.
Ed25519 identity, X25519 ECDH + AES-256-GCM per hop, concentric onion routing, and differential privacy noise. No peer sees your full query. Overhead: 0.02%.
Zero-dependency Python client. Browser-native TypeScript SDK. The internal SDK scaffolding exists — public release and docs are coming in v1.1.
Standing on the shoulders of giants
OpenHydra builds directly on two brilliant ideas. We want to be upfront about our inspirations, because intellectual honesty is cool (and mandatory if you don’t want to get ratio’d on HackerNews).
“Run large language models at home, BitTorrent‑style.” Petals proved that volunteer compute can serve real LLM inference across the internet. We took that idea and bolted on a token economy, a desktop app, and a very strong llama motif.
petals.dev →Since 2001, BitTorrent has proved you can distribute enormous files to billions of people without a central server. If it works for a band’s entire discography, it can work for Qwen 3.5 tokens. Same energy.
bittorrent.com →🦎 Fun llama fact #1: Real llamas are pack animals because they share the load across the herd. The weakest llama doesn’t carry the whole tent. This is also the core architectural principle of OpenHydra.
🦎 Fun llama fact #2: A group of llamas is called a herd. OpenHydra’s network of peers is also called a herd. We are very consistent in our metaphors and proud of this.
🦎 Fun llama fact #3: The Hydra in Greek mythology had multiple heads — cut one off and two grow back. Our bootstrap nodes work the same way. (Please don’t cut our bootstrap nodes.)
🦎 Fun llama fact #4: Llamas can spit up to 10 feet when stressed. Our nodes politely return HTTP 503 instead. Both are valid responses to being overwhelmed.
What runs on it
The default is Qwen 3.5 0.8B — tiny enough for any laptop. Larger models shard automatically across multiple peers. NF4 quantisation cuts VRAM by 4x. Add any HuggingFace model by editing models.catalog.json.
5 models in the default catalog. Add any HuggingFace model to models.catalog.json.
If the requested model lacks peers, the coordinator gracefully degrades to the nearest available smaller model.
Your laptop is sitting there doing nothing useful. It could be serving AI tokens and earning credits on the swarm. Three-headed llamas are waiting for you.