Why Your Evals Are Probably Off: Benchmarks vs. Reality

sota
58:45
Ivan is the co-founder of Pleias, an open science LLM startup.
He is a Research Professor for Semantic Data Processing at the Center for AI (CAIRO) at the University of Applied Sciences Würzburg-Schweinfurt. His interests span computational creativity, generative models, and NLP, with a strong preference for applied, empirical, and data-driven approaches.
Over the course of an hour, Ivan will discuss the challenges of benchmarking models—where the real difficulties lie—and share his perspectives on how to align benchmarks with real-world performance better. The session will conclude with an open Q&A, giving you the opportunity to ask him anything about this essential tool for assessing whether we are truly moving in the right direction.
Speaker
Ivan Yamshchikov
CTO @ Pleias

Why Your Evals Are Probably Off: Benchmarks vs. Reality
58:45