Benchmarking AI Agent Accuracy and Productivity