Fig.inc

The Scaling Laws Are Breaking

Part 1 of a series examining the need and search for new scaling laws. This piece reflects ongoing research at Fig. We welcome discussion and collaboration as we work to formalize these observations. OpenAI's Orion consumed 15-30× more training compute than GPT-4. By the scaling laws that have

To Solve the Benchmark Crisis, Evals Must Think

GPT-4 scored 95% on HumanEval. So did Claude. So did Gemini. But your production deployment still breaks on basic customer queries. We've collectively entered the what is fast becoming a dangerous phase of AI development: when benchmarks tell us nothing about what actually matters. Models have memorized the

Latest

The Scaling Laws Are Breaking

To Solve the Benchmark Crisis, Evals Must Think

Hello World!