Advanced Model Evaluation Strategies for AI Agents