/ Case study · 402.blockvault.ai
Pay-per-token GPU inference.
402.blockvault.ai is a production x402 server that sells LLM inference (Gemma 4, Llama) per token via USDC on Base. It demonstrates x402 as a real payment rail, not a demo.
→ 402.blockvault.ai
Authentication: SIWE
Users authenticate via Sign-In With Ethereum (SIWE). The wallet signs an EIP-4361 message, the server returns a JWT. All subsequent requests carry the JWT for session identity.
Credit billing model
Users purchase credits (USDC → credits) via x402. When credits run low during inference, the server returns 402 mid-stream. The wallet tops up automatically if policies allow, or pauses and asks the user.
SSE streaming
Inference responses stream via Server-Sent Events. Each chunk includes token count and remaining credits. The client renders tokens as they arrive and shows a live cost ticker.
GPU management
The server manages GPU cold-starts transparently. If no warm instance is available, the first request may take 10-30s. Subsequent requests within the session hit a warm KV cache and respond in <1s per token.
/ End-to-end flow
User opens BlockVault
→ Selects "Delegate GPU" inference mode
→ Wallet signs SIWE message (EIP-4361)
→ Server returns JWT session token
User sends prompt
→ POST /inference { model: "gemma-4-e2b", prompt: "..." }
→ Server checks credits
├── Credits OK → Stream inference via SSE
└── Credits LOW → Return 402 + payment-required
→ Wallet auto-signs x402 (EIP-3009 USDC on Base)
→ Retry with payment-signature
→ Credits topped up → Stream continuesLast updated: