Interactive tools and simulations.
A Monte Carlo simulator showing why Balanced Accuracy is the best metric for selecting LLM judges for prevalence estimation.