HomeTasksetsEnvironmentsModels
Docs
Building an RL Environment to Train Agents for Production Debugging
January 20, 2026•Engineering•10 min read

Building an RL Environment to Train Agents for Production Debugging

We built an RL environment for ops diagnostics across Sentry, Supabase, Railway, and Kubernetes—with 24 real production tasks for training agents to debug your stack.

The HUD TeamRead more
Evaluating Agents on Financial Analyst Workflows (SheetBench)
October 1, 2025•Enterprise•10 min read

Evaluating Agents on Financial Analyst Workflows (SheetBench)

A case study on developing evaluations for agent performance on finance analyst jobs.

The HUD Team, Sepal AIRead more
HUD Autonomy: How do we evaluate and improve AI agents?
January 24, 2025•Research•8 min read

HUD Autonomy: How do we evaluate and improve AI agents?

At HUD, our mission is to help align human and AI agents' behavior. Today, we're excited to introduce Autonomy, our comprehensive evaluation framework for AI agents.

Lorenss Martinsons, The HUD TeamRead more

Stay Updated

Join our mailing list to receive the latest research updates, benchmark releases, and insights into AI agent development.

Mailing List
HUD

© 2026 Human Union Data, Inc.

CareersContact|PrivacyTermsCompliance