/ Case study · 402.blockvault.ai

Pay-per-token GPU inference.

402.blockvault.ai is a production x402 server that sells LLM inference (Gemma 4, Llama) per token via USDC on Base. It demonstrates x402 as a real payment rail, not a demo.

→ 402.blockvault.ai

/ 01

Authentication: SIWE

Users authenticate via Sign-In With Ethereum (SIWE). The wallet signs an EIP-4361 message, the server returns a JWT. All subsequent requests carry the JWT for session identity.

/ 02

Credit billing model

Users purchase credits (USDC → credits) via x402. When credits run low during inference, the server returns 402 mid-stream. The wallet tops up automatically if policies allow, or pauses and asks the user.

/ 03

SSE streaming

Inference responses stream via Server-Sent Events. Each chunk includes token count and remaining credits. The client renders tokens as they arrive and shows a live cost ticker.

/ 04

GPU management

The server manages GPU cold-starts transparently. If no warm instance is available, the first request may take 10-30s. Subsequent requests within the session hit a warm KV cache and respond in <1s per token.

/ End-to-end flow

User opens BlockVault
  → Selects "Delegate GPU" inference mode
  → Wallet signs SIWE message (EIP-4361)
  → Server returns JWT session token

User sends prompt
  → POST /inference { model: "gemma-4-e2b", prompt: "..." }
  → Server checks credits
    ├── Credits OK → Stream inference via SSE
    └── Credits LOW → Return 402 + payment-required
        → Wallet auto-signs x402 (EIP-3009 USDC on Base)
        → Retry with payment-signature
        → Credits topped up → Stream continues

Last updated: