Netdata alert web_service_slow is too aggressive
The chart is currently defined like this:
template: web_service_slow
families: *
on: httpcheck.responsetime
lookup: average -5m unaligned of time
units: ms
every: 10s
warn: ($this > ($1h_web_service_response_time * 4) )
crit: ($this > ($1h_web_service_response_time * 6) )
info: average response time over the last 5 minutes, compared to the average over the last hour
delay: down 5m multiplier 1.5 max 1h
to: webmaster
The original from Netdata is even more aggressive:
template: web_service_slow
families: *
on: httpcheck.responsetime
lookup: average -3m unaligned of time
units: ms
every: 10s
warn: ($this > ($1h_web_service_response_time * 2) )
crit: ($this > ($1h_web_service_response_time * 3) )
info: average response time over the last 5 minutes, compared to the average over the last hour
delay: down 5m multiplier 1.5 max 1h
to: webmaster
My interpretation of warn: ($this > ($1h_web_service_response_time * 4) )
is that we get a warning if the current (= $this
) response time if 4 times as much as the average response time of the last hour. That would mean, if we get a critical alert with a value 90ms, the average of the last hour must have been less than 15ms.
Or am I reading that wrong?