Teams want to push ML to the browser for cost, privacy, and offline reasons. But the post-embedding compute (similarity matmul, ranking, clustering) gets stuck on the main thread or hard-coded to WASM, and the dispatch story matters most exactly there.
The custom-rolled matmul and ranking layer that sits between Transformers.js / ONNX and the UI. The compute engine handles the K×K similarity matrix; ayoob-sort handles the top-K ranking; each picks GPU or CPU per call.
Working translation, sentiment classification, and semantic-search demos running on real Hugging Face models - on whatever hardware the user happens to have.
Hadron is not claiming the GPU is fastest. It picks the right backend per operation. A small attention matmul on a quantised model might lose on GPU because of transfer cost; the same matmul at sequence length 512 wins on GPU by 5×. Watch the Dispatch evidence panel beside each app: every model invocation prints which path was chosen, the score, and a one-line reason.