SLM-Bench - CycleCore Technologies

What is SLM-Bench?

SLM-Bench is CycleCore's edge AI evaluation initiative that provides rigorous, transparent benchmarks for Small Language Models.

While academic benchmarks focus on general capabilities, SLM-Bench evaluates what matters for production deployment: function calling, JSON extraction, and intent classification across Raspberry Pi, laptops, and browsers.

The Problem: Existing benchmarks don't reflect real-world edge AI challenges. Academic tasks like MMLU and HellaSwag don't test function calling. No standardized energy measurement. No cross-platform validation.

Our Solution: Practical benchmarks that measure what matters for production. Independent evaluation service. Transparent methodology. Public leaderboard.

What We Offer

📊

Edge Pack Benchmarks

Practical tasks for production deployment: EdgeJSON (extraction), EdgeIntent (classification), EdgeFuncCall (tool use). Open-source, reproducible.

🏆

Public Leaderboard

Compare 10+ SLMs across benchmark tasks. Free access. Transparent methodology. No pay-to-play rankings.

🔬

Professional Evaluation

Get your SLM independently evaluated. Detailed reports, energy measurement, cross-platform testing. $2.5K-$7.5K per model.

⚡

Energy Measurement

Standardized power consumption testing with Joulescope hardware. Joules per task, tokens per joule, cost per 1M tokens.

💻

Cross-Platform Testing

Evaluate on Raspberry Pi 5, mid-range laptops, and browser (WebGPU). Real-world deployment scenarios.

🎯

Baseline Models

CycleCore-certified fine-tuned SLMs. SmolLM2, Qwen2.5, Llama 3.2. Compare against production-ready baselines.

CycleCore Lab

Open-source models and research

🧠

Maaza MLM-135M-JSON

CycleCore Maaza Series

135M parameter micro language model for edge JSON extraction. Perfect on simple schemas (2-4 fields), deployable on Raspberry Pi and browsers.

24.7% JSONExact • 44.7% on simple schemas

Intended Hardware: Raspberry Pi, browsers (WebGPU), mobile devices

HuggingFace → • Apache 2.0

🚀

Maaza SLM-360M-JSON

CycleCore Maaza Series

360M parameter small language model for high-accuracy JSON extraction. Handles complex nested structures (8+ fields), production-ready for most use cases.

55.1% JSONExact • ~75% on simple schemas

Intended Hardware: Laptops, edge servers, mid-range devices

HuggingFace → • Apache 2.0

⚡

Qwen 2.5-0.5B

Baseline Model

500M parameter general-purpose language model from Alibaba Cloud. Tested on EdgeJSON to establish baseline performance for community models.

14.6% JSONExact • 29.0% on simple schemas

Intended Hardware: General-purpose CPU/GPU (flexible deployment)

HuggingFace → • Apache 2.0

Models available on HuggingFace. See SLM-Bench.com leaderboard for benchmarks and comparisons.
Note: All benchmark results reflect testing on the same hardware for fair comparison. "Intended Hardware" indicates each model's target deployment environment.

Edge Pack Tasks

📋 EdgeJSON: Structured Extraction

1,000+ real-world JSON schemas of diverse complexity. Test schema compliance, field accuracy, and error handling.

Baselines: Qwen2.5-0.5B (evaluated), SmolLM2-1.7B, Qwen2.5-1.5B

🎯 EdgeIntent: Classification

50-200 class taxonomy at enterprise scale. Few-shot and zero-shot variants. Measures accuracy, latency, and energy per inference.

Baseline: SmolLM2-360M, Qwen2.5-0.5B

⚙️ EdgeFuncCall: Tool Use

Multi-turn tool use scenarios. Tests parameter extraction accuracy and error recovery with realistic APIs.

Baseline: Llama 3.2 3B, custom distilled models

Evaluation Service

Single Verification

$20

per model

✓ CCT✓ Verified badge on leaderboard
✓ Official certificate (PDF)
✓ Detailed evaluation report
✓ 48-hour turnaround

Learn More

BEST VALUE

Verification Pack

$79

5 models (21% savings)

✓ Everything in Single × 5
✓ Multi-model comparison report
✓ Priority queue
✓ Bulk certificate download
✓ Only $15.80/model

Get Started

Enterprise

$499

per month

✓ Unlimited verifications
✓ API access (automated evaluation)
✓ Private leaderboard
✓ Priority support (24hr response)
✓ Custom benchmarks (1/quarter)

Contact Sales

Independent evaluation. Transparent methodology. No pay-to-play rankings.

Who Uses SLM-Bench?

🔬 SLM Developers

Validate your model's production readiness. Get independent evaluation on practical tasks. Benchmark against leading SLMs.

🏢 Enterprises

Make informed SLM procurement decisions. Compare models on tasks that matter for your deployment. Verify vendor claims.

🔧 Hardware Vendors

Showcase your platform's AI capabilities. Get independent validation of performance and energy efficiency.

🎓 Researchers

Access open-source benchmark suite. Reproduce results. Contribute to edge AI evaluation methodology.

Independent. Rigorous. Transparent.

We publish our methodology. We don't accept payment for rankings. We open-source our benchmark suite.

Evaluation-as-a-Service, not pay-to-play.

Ready to Benchmark Your SLM?

Visit SLM-Bench.com to view the leaderboard, explore benchmarks, and request professional evaluation.

Visit SLM-Bench.com →

Questions About SLM-Bench?

Want to learn more about our evaluation service or benchmark methodology? Get in touch.

Email Us Visit SLM-Bench.com