Interfaze Leaderboard · interfaze.ai

🏆 SOB Leaderboard

The Structured Output Benchmark

A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models

📄 Paper · 💻 Code · 🤗 Dataset

Sorted by Overall — a record-count-weighted, coverage-adjusted aggregate across all three source modalities (text, image, audio). Value Accuracy is the primary metric: fraction of ground-truth leaf paths where the predicted value exactly matches. Click column headers to sort; use the search box to filter by model name.