The model that powers commerce-grade AI.
ChatCast runs on a proprietary intent classifier trained specifically for ecommerce queries. Smaller, faster, cheaper, and more accurate than any general-purpose LLM on the task that drives shopping conversions.
66M parameters, beating models with 1,000× more.
We ran 1,000 real product-query pairs through our model and every frontier LLM on the market. The accuracy gap held on out-of-distribution categories — proving the model learned the structure of product relevance, not just memorized the training set.
| Model | In-dist accuracy | Out-of-dist accuracy | Latency | Cost / query |
|---|---|---|---|---|
| ChatCast (DistilBERT, 66M params) | 81.0% | 80.5% | 37ms | $0.00 |
| Gemini 2.5 Pro | 70.0% | 72.5% | 2,285ms | $0.24 |
| GPT-4o | 69.5% | 67.5% | 803ms | $0.48 |
| Gemini 2.5 Flash Lite | 69.5% | 64.5% | 716ms | $0.02 |
| Claude Haiku 4.5 | 68.0% | 68.5% | 804ms | $0.23 |
| Claude Opus 4.5 | 67.0% | 64.5% | 1,516ms | $3.44 |
| GPT-4o-mini | 66.0% | 64.0% | 807ms | $0.03 |
| Claude Sonnet 4.5 | 65.0% | 62.5% | 1,212ms | $0.68 |
Lower latency and cost are better. Methodology and dataset details in the full benchmark.
Built on four design choices.
Trained on real shopping data
Public ecommerce datasets gave us scale. Synthetic data balanced the long tail. Proprietary query logs gave us accuracy on the queries that actually drive conversions.
66M parameters, fine-tuned
A DistilBERT base, fine-tuned for ecommerce intent classification. Small enough to run on CPU. Specialized enough to outperform models 1,000× larger on the task that matters.
Composes with frontier LLMs
Our intent model routes the query. Frontier LLMs (Gemini, Claude, GPT) handle response synthesis where their generation strength wins. Best tool for each job.
Owned, not rented
We control the weights, the training data, and the deployment. No upstream pricing changes, rate limits, or model deprecations to scramble around.
93.7% on exact matches.
The classification that drives conversions: when the model says "this is the right product," it's almost always correct.
| Intent | In-dist | Out-of-dist |
|---|---|---|
| Exact Match | 93.7% | 91.1% |
| Substitute | 59.2% | 71.8% |
| Irrelevant | 65.2% | 42.9% |
The model picks the right product. You see the lift.
11 points more accurate than GPT-4o
Routes shoppers to the product they actually want. Higher conversion, fewer wrong-product abandonments.
20× faster than frontier LLMs
37ms vs. 700–2,300ms means the response feels instant. No spinner. No drop-off while waiting.
$0 per query
Run inference on every pageview, every search, every interaction without watching the meter. Frontier-LLM pricing kills always-on UX.
0.5pt drop on unseen categories
Most LLMs lose 2–5 points on out-of-distribution queries. Our model holds at 80.5% — it learned structure, not memorization.
Specialized for routing. Composes with frontier LLMs for generation.
Our intent model handles the high-volume, high-stakes work: classifying the shopper's intent and finding the right product. Frontier LLMs handle conversational generation where their strength wins. The result is faster, cheaper, and more accurate than relying on a generalist for both jobs.
Owned weights, EU-hosted
We host our own model. Your queries never train a third-party LLM. GDPR-compliant by default.
No upstream surprises
No rate limits, no deprecated model IDs, no overnight pricing changes. The model that runs your store today runs it tomorrow.
Continuously improved
We retrain on aggregated shopping signals across our network. Every store benefits from every search.
See the model on your store.
Connect Shopify in 15 minutes. The intent classifier is live the moment your catalog syncs — no configuration required.