/ On-device AI · LiteRT
AI that runs on your phone.
BlockVault ships Gemma 4 models that run entirely on your device. Your prompts, transactions, and private keys never leave your phone. Need more power? Delegate to a GPU via x402 and pay only for what you use.
/ What is on-device AI
Your AI, your hardware.
On-device AI means running large language models on your phone's processor: no internet, no cloud API, no third-party servers. BlockVault uses LiteRT (the evolution of TensorFlow Lite) to execute quantized Gemma 4 models with hardware acceleration. Every inference happens locally. Your data stays private by design, not by policy.
Bundled models
- Gemma 4 E2B — Ultra-fast for transaction review, address validation, and quick Q&A. Runs on any phone with 4 GB RAM.
- Gemma 4 E4B — Full agent capabilities: multi-step reasoning, scam detection, skill execution. Requires 6 GB RAM.
- LM Studio — Connect to a local LM Studio instance on your LAN for desktop-grade models without cloud latency.
/ Privacy by architecture
Zero data leaves your device.
Most AI wallets send your transaction history, balances, and prompts to a cloud API. BlockVault doesn't. Inference runs on your hardware, private keys live in hardware-backed storage, and the network only sees transactions you approve.
No cloud prompts
Your questions and wallet context stay on-device. Nothing is sent to OpenAI, Google, or any third-party inference API.
Hardware-backed keys
Private keys are stored in the Android Keystore (TEE/StrongBox). The AI model can't access raw keys, only sign requests you approve.
Works offline
On-device inference works without internet. Review transactions, detect phishing, and query your balances even in airplane mode.
/ Local inference
Three runtimes, one wallet.
BlockVault supports three inference modes so you've got the right balance of speed, privacy, and capability. From a 2B model that answers in milliseconds to a full GPU server for complex multi-step agent tasks.
- → On-device (LiteRT): zero latency, fully offline, max privacy
- → LM Studio (LAN): desktop-grade models, no cloud, sub-100ms
- → Delegate GPU (x402): server-side power, pay-per-token in USDC
/ Delegated GPU via x402
When you need more power.
For complex agent tasks like multi-hop reasoning, large context windows, or batch operations, BlockVault delegates inference to 402.blockvault.ai. You pay per token in USDC on Base via x402. No subscription, no API key, no account.
How Delegate GPU works →/ On-device vs cloud
Why local inference wins.
| Dimension | On-device (BlockVault) | Cloud AI (typical) |
|---|---|---|
| Data privacy | Prompts never leave the phone | Sent to third-party servers |
| Cost | Free (hardware you own) | Per-token API fees |
| Latency | < 50ms first token | 200–800ms network round-trip |
| Offline capable | Yes, fully functional | No, requires internet |
| Data control | Self-custody, you own all data | Provider's terms of service apply |
/ FAQ
On-device AI questions.
- What AI models does BlockVault run on-device?
- BlockVault ships with Gemma 4 E2B (2 billion parameters) and Gemma 4 E4B (4 billion parameters), both optimized for mobile via LiteRT quantization. You can also connect a local LM Studio instance for larger models.
- Does BlockVault send my data to a cloud AI?
- No. On-device inference runs entirely on your phone's CPU/GPU. Your prompts, wallet balances, and transaction history never leave the device. If you opt into Delegate GPU, only the specific prompt is sent. Your keys and balances stay local.
- Can I use AI features without internet?
- Yes. On-device inference works fully offline. You can review transactions, detect phishing links, validate addresses, and query your portfolio without any network connection.
- How does on-device AI compare to ChatGPT or cloud APIs?
- On-device models are smaller (2–4B params vs. 100B+) but run with zero latency and complete privacy. For most wallet tasks (transaction review, scam detection, quick Q&A), they perform comparably. For complex multi-step reasoning, BlockVault lets you delegate to a GPU server via x402 and pay only for what you use.
- What is Delegate GPU and how do I pay for it?
- Delegate GPU sends your prompt to 402.blockvault.ai, a server running larger models on dedicated GPUs. You pay per token in USDC on Base via x402, typically $0.001–$0.01 per response. No subscription, no API key.
/ Get started
AI that respects your privacy.
Download BlockVault and run AI on your phone. No cloud, no subscriptions, no data leaks. Free on Android.