Logs
The log is the record of everything the monitor does: each check it runs against a resource, each threshold it crosses, and any error it hits along the way. It is where you confirm that rules are running, and where you look first when an expected alert did not fire.
Each entry is timestamped and tagged with a type, the kind of event it records. When it relates to a specific resource, the entry also names the service and metric it read from. Alongside a short summary, every entry keeps its full payload, including any error message, so a check that failed or a notification that did not send shows exactly what went wrong.
Entry types
- Check Completed: the monitor finished evaluating a resource. Each enabled rule is checked on its own interval (one minute by default), so a healthy rule leaves a Check Completed entry on that cadence, and a gap in them is itself a signal something is wrong. The entry records the metric value that was read. If the metric could not be read, for example because the service was unreachable, the error is recorded on the entry instead.
- Threshold Crossed: a threshold was crossed during a check. The expanded row shows the metric value at the time, the threshold name and comparison that fired, and which actions ran. When a value crosses several thresholds at once, only the nearest one runs its actions; the others are recorded here as crossed but skipped, with the reason.
- Action Triggered: an action ran in response to a crossed threshold. The entry names the action and records its outcome, so you can confirm a notification was delivered or see why it failed, such as an invalid SendGrid key or an unreachable webhook URL.
- Alert Resolved: a metric that was over a threshold returned to the normal side of the line, clearing the active alert. This is what drops a rule out of the Overview's Active alerts count.
Finding entries
Use the search box to filter the list, and the time-range buttons to scope it to a window: 1h, 6h, 24h, 7d, or 30d. Narrow the window to focus on a recent incident, or widen it to confirm a rule has been running over time.
Troubleshooting
A rule is enabled but no Check Completed entries appear. Check the system entries for a validation error on that rule. A threshold with no action configured is a common cause: the rule fails at load time and never starts evaluating.
An alert fired but no notification arrived. Find the Action Triggered entry for that event and expand it. Its outcome shows whether the action ran and any delivery error. Verify the action's credentials (SendGrid key, Twilio SID, webhook URL) on the Actions page.
An alert is not firing even though the metric looks high. Open a Check Completed entry for that rule and check the metric value the monitor recorded. If the value is lower than expected, the alert is working correctly: the metric value at check time may differ from what you see in another tool. If no Check Completed entries exist for the rule, see the first point above.
Use cases
- Confirm a new alert is being checked as expected.
- Find out why an alert did not fire, for example a metric that could not be read.
- Trace what happened around an incident, using the time and detail of each entry.