Skip to content

Agent Observability

Programmatic fleet monitoring for orchestrators and supervisory agents.

Human Oversight covers humans watching agents through the dashboard and conversational queries. This page covers the other direction: an orchestrator agent (OpenClaw, a scheduler, a supervisory loop) monitoring its fleet programmatically through CLI and JSON.

Fleet status

Check what's running, what's idle, and what's stuck:

bash
hzl agent status
json
{
  "agents": [
    {
      "agent": "clara",
      "isActive": true,
      "activeDurationMs": 720000,
      "tasks": [{
        "taskId": "abc123",
        "title": "Write blog post draft",
        "project": "writing",
        "claimedAt": "2026-03-02T10:00:00Z",
        "progress": 60,
        "leaseUntil": "2026-03-02T11:00:00Z",
        "leaseExpired": false
      }],
      "lastActivity": "2026-03-02T10:10:00Z",
      "stats": null
    },
    {
      "agent": "kenji",
      "isActive": true,
      "activeDurationMs": 2700000,
      "tasks": [{
        "taskId": "def456",
        "title": "Research competitor analysis",
        "project": "research",
        "claimedAt": "2026-03-02T09:15:00Z",
        "progress": null,
        "leaseUntil": "2026-03-02T10:15:00Z",
        "leaseExpired": true
      }],
      "lastActivity": "2026-03-02T09:20:00Z",
      "stats": null
    }
  ],
  "summary": { "total": 3, "active": 2, "idle": 1 }
}

Human-readable output (for logs or debugging):

bash
hzl agent status --format md
Agents (2 active, 1 idle):

● clara      [active 12m]  Write blog post draft (p:writing, 60%)
● kenji      [active 45m]  Research competitor analysis (p:research)
              ⚠ Lease expired 15m ago
○ writer-2   [idle 2h]

Filtering

bash
# Single agent
hzl agent status --agent clara

# Single project
hzl agent status --project writing

# Include per-agent task count breakdowns
hzl agent status --stats

With --stats, each agent includes a breakdown:

json
{
  "stats": { "total": 5, "counts": { "in_progress": 1, "done": 3, "ready": 1 } }
}

Agent activity log

Inspect what a specific agent has been doing:

bash
hzl agent log clara
json
{
  "agent": "clara",
  "events": [
    {
      "timestamp": "2026-03-02T10:10:00Z",
      "type": "status_changed",
      "taskId": "abc123",
      "taskTitle": "Write blog post draft"
    },
    {
      "timestamp": "2026-03-02T10:05:00Z",
      "type": "checkpoint_recorded",
      "taskId": "abc123",
      "taskTitle": "Write blog post draft"
    }
  ],
  "total": 47
}
bash
# Show more history (default: 50)
hzl agent log clara --limit 100

Detecting stuck agents

An agent is likely stuck when its lease has expired and no recent activity appears. The leaseExpired field in agent status surfaces this directly.

bash
# Orchestrator pseudo-logic:
# 1. Poll fleet status
# 2. For each active agent where leaseExpired is true, check activity
# 3. If no recent events, recover the task

hzl agent status
# → find agents with leaseExpired: true

hzl agent log kenji --limit 5
# → confirm no recent activity

hzl task steal <task-id> --if-expired --agent backup-agent --lease 60
# → reassign to a healthy agent

This complements hzl task stuck, which finds expired-lease tasks directly. agent status gives the fleet-level view; task stuck gives the task-level view.

Orchestrator patterns

Health-check loop

An orchestrator can poll fleet state on a schedule and take action:

bash
# Every 5 minutes, check fleet health
hzl agent status | process_fleet_health

# If an agent has been active > 2 hours with no progress change, alert
# If a lease expired > 30 minutes ago, auto-steal and reassign

Post-session audit

After an agent session ends, review what happened:

bash
hzl agent log clara --limit 20

Useful for verifying that the agent completed its assigned work, created appropriate follow-on tasks, and didn't leave tasks in a bad state.

Scaling decisions

Use --stats to inform whether to spin up or wind down agents:

bash
hzl agent status --project writing --stats
# → If many ready tasks and few active agents, scale up
# → If no ready tasks and agents are idle, scale down

When to use what

NeedTool
Orchestrator checking fleet statehzl agent status
Orchestrator investigating one agenthzl agent log <agent>
Orchestrator finding stale taskshzl task stuck
Human checking fleet at a glancehzl agent status --format md
Human investigating visuallyWeb dashboard Agent Operations view
Human asking conversationallyAsk your primary agent

HZL - Shared task ledger for OpenClaw