# TokenMark

> The standard browser benchmark for local AI and WebGPU performance.

TokenMark is a free, open web application that benchmarks how fast your device can run Large Language Models (LLMs) locally in the browser using WebGPU. It measures tokens per second for both prompt processing (prefill) and text generation (decode), producing a standardized score for cross-device comparison.

## Key Facts
- URL: https://tokenmark.app
- Type: Web application (no install required)
- Technology: WebGPU via @mlc-ai/web-llm
- Cost: Free (Pro tier available)
- Supported: Any device with WebGPU-capable browser (Chrome 113+, Edge 113+, Safari 18+)

## What TokenMark Measures
- Decode speed (tokens/second): How fast the model generates output
- Prefill speed (tokens/second): How fast the model processes input
- Time to first token (ms): Latency before generation starts
- Overall score: Weighted composite

## Leaderboard
Public rankings at https://tokenmark.app/leaderboard comparing Apple Silicon, NVIDIA RTX, AMD RDNA, Qualcomm Snapdragon, Intel Arc and more.

## Community
Hardware benchmarks, tips, and local AI news at https://tokenmark.app/news

## API
OpenAI-compatible local inference API at https://tokenmark.app/api

## FAQ
Q: How do I test my AI PC performance?
A: Visit tokenmark.app/bench and run the benchmark. Uses WebGPU with real LLM inference.

Q: What is a good score?
A: Entry-level: 10-20 t/s. Mid-range: 30-60 t/s. High-end GPUs: 100+ t/s.

Q: Does it work on mobile?
A: Yes, on devices with WebGPU support.

Q: Is my data sent to a server?
A: The LLM runs entirely in your browser. Only scores are optionally uploaded.