AI Safety & Agent EvaluationIntermediateAvailable

Measuring Agent Reliability in Long-Running Tasks

Wednesday, June 10, 2026 · 3:30 PM – 4:30 PM · Hall D1

How to design metrics that capture multi-step success, partial credit, and recovery behavior for agents that run for hours.

Topic

Reliability

Time of day

Afternoon

Speaker

Ben Carter

Company

Sentinel AI

Room

Hall D1

Session ID

S020