model evaluation

Executive Summary Benchmarks support the empirical, quantitative evaluation of progress in AI research. Although benchmarks are ubiquitous in most subfields of machine learning, they are still rare in the subfield of AI safety. In this post, I argue that creating benchmarks should be a high priority for AI safety. While this idea is not new, I think it may still be underrated. Among other benefits, benchmarks would make it much easier to:...