owl's alerter is a small fixed-interval loop that evaluates threshold rules against the same query engine the dashboards use, then posts a JSON event to a configured webhook on every state transition. There is no rule UI, no notification routing tree, no silence catalogue — the webhook is the integration point.
Rule shape
Rules live under alerts: in config.yml. The example below parses
cleanly and would run as-is once a real webhook_url is provided.
listen: "0.0.0.0:9090"
storage:
path: "/data/owl.db"
retention:
time: 7d
size: 100MB
scrape:
default_interval: 30s
default_timeout: 10s
targets:
- name: owl-self
url: "http://127.0.0.1:9090/metrics"
labels:
job: owl
dashboards:
dir: "/etc/owl/dashboards"
alerts:
webhook_url: "https://hooks.example/abc"
rules:
- name: high_cpu
expr: 'sum(rate(node_cpu_seconds_total{mode!="idle"}[1m])) / count(node_cpu_seconds_total{mode="idle"})'
op: ">"
threshold: 0.8
for: 2m
- name: low_disk
expr: "node_filesystem_avail_bytes"
op: "<"
threshold: 1073741824
for: 5m
- name: webhook_stuck
expr: "increase(owl_alerts_webhook_failures_total[10m])"
op: ">"
threshold: 0
for: 0s
Each rule has name, expr (any expression the PromQL
subset supports), op (>, >=, <, <=),
threshold, and a for duration the condition must hold before owl
marks the rule firing.
Per-series fan-out
A rule fans out across every series its expression returns. The
low_disk example above produces one independent alert per
filesystem. node_filesystem_avail_bytes{mountpoint="/"} and
{mountpoint="/data"} track separate firing / resolved lifecycles
identified by their labels. One rule covers every container, every
filesystem, every interface the matcher selects.
The dedup unit is therefore (rule_name, series_labels). The
webhook receives one firing event per crossing series, then one
resolved event per series that clears. Nothing is re-sent while a
pair stays in either state.
Webhook payload
Firing event:
{
"rule": "low_disk",
"expr": "node_filesystem_avail_bytes",
"op": "<",
"threshold": 1073741824,
"value": 536870912,
"status": "firing",
"labels": {"mountpoint": "/data", "fstype": "ext4"},
"fired_at": "2026-05-15T20:31:04Z"
}
Resolved event — same shape plus resolved_at:
{
"rule": "low_disk",
"expr": "node_filesystem_avail_bytes",
"op": "<",
"threshold": 1073741824,
"value": 1610612736,
"status": "resolved",
"labels": {"mountpoint": "/data", "fstype": "ext4"},
"fired_at": "2026-05-15T20:31:04Z",
"resolved_at": "2026-05-15T20:48:22Z"
}
The webhook URL is normally injected via OWL_ALERT_WEBHOOK_URL so
it stays out of YAML. The delivery sink is swapped atomically on
reload: in-flight POSTs finish against the old URL; the next state
transition uses the new one.
Self-instrumentation
Four counters on /metrics describe the alerter's own state:
owl_alerts_evaluations_total, owl_alerts_webhook_sends_total,
owl_alerts_webhook_failures_total and owl_alerts_firing. Wire
them back into your rules to catch a stuck loop or a broken receiver
— the third rule in the example above does exactly that. See
/metrics for the full list.