๐Ÿงช Experiments Lab

Model benchmarks, cascade routing, and AI research

12
Experiments
6
Models Tested
3
Cascade Routes
85%
Avg Accuracy

๐ŸŽฏ Benchmark Winner: Experiment Design

deepseek-v3.2 ๐Ÿ‘‘ Champion

Consistently designs most creative and insightful tests

Wins 3
Category Experiment Design

๐Ÿ† Recent Discoveries

Live
Loading...

๐Ÿ”€ Cascade Routing

Active

5-Tier Cascade System

Routes queries to optimal model based on complexity

Tier Model Cost Accuracy Use Case
1 phi4-mini (local) FREE 70% Simple math
2 gemma3:27b (cloud) FREE 100% General queries
3 qwen3-coder:480b (cloud) FREE 100% Coding tasks
4 Claude Haiku $0.003 100% Security/critical
5 Claude Sonnet/Opus $$$ 100% Complex/creative

Recent Routes

16:24:25 local What is 15 + 27?
16:24:29 cloud-coder Write a Python function to sort a list
16:24:36 haiku Is this code vulnerable to SQL injection...

๐Ÿ“Š Model Benchmarks

Feb 5, 2026
phi4-mini (Local) Tested

2.5GB โ€ข ~5s/response โ€ข CPU

Accuracy 70%
Speed ~5 sec
Cost FREE
gemma3:27b (Cloud) Tested

Ollama Cloud โ€ข Free tier

Accuracy 100%
Speed ~3 sec
Cost FREE
qwen3-coder:480b (Cloud) Tested

Ollama Cloud โ€ข Coding specialist

Accuracy 100%
Speed ~5 sec
Cost FREE
deepseek-v3.2 (Cloud) Tested

671B params โ€ข Reasoning

Accuracy 100%
Speed ~8 sec
Cost FREE
qwen2.5-coder:3b (Local) Tested

1.9GB โ€ข Local coding

Accuracy 40%
Speed ~3.7 sec
Cost FREE
mxbai-embed-large Tested

669MB โ€ข Embeddings

Retrieval 100%
Speed 12.9 docs/s
Cost FREE

โšก Prompt Powerups

Completed
Chain of Thought (CoT)

Step-by-step reasoning

Improvement +15%
Best for Logic, Math
Structured Output

JSON format enforcement

Improvement +20%
Best for Data extraction
Role Prompting

"You are an expert..."

Improvement +5%
Best for Domain tasks
Verify Step

"Check your answer"

Improvement +10%
Best for Math, Code