What AI models does BlockVault run on-device?

BlockVault ships with Gemma 4 E2B (2 billion parameters) and Gemma 4 E4B (4 billion parameters), both optimized for mobile via LiteRT quantization. You can also connect a local LM Studio instance for larger models.

Does BlockVault send my data to a cloud AI?

No. On-device inference runs entirely on your phone's CPU/GPU. Your prompts, wallet balances, and transaction history never leave the device. If you opt into Delegate GPU, only the specific prompt is sent. Your keys and balances stay local.

Can I use AI features without internet?

Yes. On-device inference works fully offline. You can review transactions, detect phishing links, validate addresses, and query your portfolio without any network connection.

How does on-device AI compare to ChatGPT or cloud APIs?

On-device models are smaller (2–4B params vs. 100B+) but run with zero latency and complete privacy. For most wallet tasks (transaction review, scam detection, quick Q&A), they perform comparably. For complex multi-step reasoning, BlockVault lets you delegate to a GPU server via x402 and pay only for what you use.

What is Delegate GPU and how do I pay for it?

Delegate GPU sends your prompt to 402.blockvault.ai, a server running larger models on dedicated GPUs. You pay per token in USDC on Base via x402, typically $0.001–$0.01 per response. No subscription, no API key.

/ On-device AI · LiteRT

AI that runs on your phone.

BlockVault ships Gemma 4 models that run entirely on your device. Your prompts, transactions, and private keys never leave your phone. Need more power? Delegate to a GPU via x402 and pay only for what you use.

Get it onGoogle Play Get it onApp Store

/ What is on-device AI

Your AI, your hardware.

On-device AI means running large language models on your phone's processor: no internet, no cloud API, no third-party servers. BlockVault uses LiteRT (the evolution of TensorFlow Lite) to execute quantized Gemma 4 models with hardware acceleration. Every inference happens locally. Your data stays private by design, not by policy.

Bundled models

Gemma 4 E2B — Ultra-fast for transaction review, address validation, and quick Q&A. Runs on any phone with 4 GB RAM.
Gemma 4 E4B — Full agent capabilities: multi-step reasoning, scam detection, skill execution. Requires 6 GB RAM.
LM Studio — Connect to a local LM Studio instance on your LAN for desktop-grade models without cloud latency.

/ Privacy by architecture

Zero data leaves your device.

Most AI wallets send your transaction history, balances, and prompts to a cloud API. BlockVault doesn't. Inference runs on your hardware, private keys live in hardware-backed storage, and the network only sees transactions you approve.

No cloud prompts

Your questions and wallet context stay on-device. Nothing is sent to OpenAI, Google, or any third-party inference API.

Hardware-backed keys

Private keys are stored in the Android Keystore (TEE/StrongBox). The AI model can't access raw keys, only sign requests you approve.

Works offline

On-device inference works without internet. Review transactions, detect phishing, and query your balances even in airplane mode.

/ Local inference

Three runtimes, one wallet.

BlockVault supports three inference modes so you've got the right balance of speed, privacy, and capability. From a 2B model that answers in milliseconds to a full GPU server for complex multi-step agent tasks.

→ On-device (LiteRT): zero latency, fully offline, max privacy
→ LM Studio (LAN): desktop-grade models, no cloud, sub-100ms
→ Delegate GPU (x402): server-side power, pay-per-token in USDC

/ Delegated GPU via x402

When you need more power.

For complex agent tasks like multi-hop reasoning, large context windows, or batch operations, BlockVault delegates inference to 402.blockvault.ai. You pay per token in USDC on Base via x402. No subscription, no API key, no account.

How Delegate GPU works →

/ On-device vs cloud

Why local inference wins.

Dimension	On-device (BlockVault)	Cloud AI (typical)
Data privacy	Prompts never leave the phone	Sent to third-party servers
Cost	Free (hardware you own)	Per-token API fees
Latency	< 50ms first token	200–800ms network round-trip
Offline capable	Yes, fully functional	No, requires internet
Data control	Self-custody, you own all data	Provider's terms of service apply

/ FAQ

On-device AI questions.

What AI models does BlockVault run on-device?: BlockVault ships with Gemma 4 E2B (2 billion parameters) and Gemma 4 E4B (4 billion parameters), both optimized for mobile via LiteRT quantization. You can also connect a local LM Studio instance for larger models.
Does BlockVault send my data to a cloud AI?: No. On-device inference runs entirely on your phone's CPU/GPU. Your prompts, wallet balances, and transaction history never leave the device. If you opt into Delegate GPU, only the specific prompt is sent. Your keys and balances stay local.
Can I use AI features without internet?: Yes. On-device inference works fully offline. You can review transactions, detect phishing links, validate addresses, and query your portfolio without any network connection.
How does on-device AI compare to ChatGPT or cloud APIs?: On-device models are smaller (2–4B params vs. 100B+) but run with zero latency and complete privacy. For most wallet tasks (transaction review, scam detection, quick Q&A), they perform comparably. For complex multi-step reasoning, BlockVault lets you delegate to a GPU server via x402 and pay only for what you use.
What is Delegate GPU and how do I pay for it?: Delegate GPU sends your prompt to 402.blockvault.ai, a server running larger models on dedicated GPUs. You pay per token in USDC on Base via x402, typically $0.001–$0.01 per response. No subscription, no API key.

/ Get started

AI that respects your privacy.

Download BlockVault and run AI on your phone. No cloud, no subscriptions, no data leaks. Free on Android.

Get it onGoogle Play Get it onApp Store