Cron Job Healthchecks
Monitoring and Alerting if cronjobs are running properly is the next important step to improve the stability of our systems. The best tool for that task is https://healthchecks.io - it maintains plans on all the expected cronjobs on each host and a schedule on how often each cronjob is expected to run. If a cronjob doesn't ping the service after each successful execution, the system raises an alert and pushes that to many different channels, that can be configured.
It can be self-hosted (see https://github.com/healthchecks/healthchecks) and we have done first very successful checks. The only thing that needs to be changed for the cronjobs is to add a curl command at the end of each line in the cronjob definition.
Useful links:
- https://docs.linuxserver.io/images/docker-healthchecks
- http://127.0.0.1:8094
- http://127.0.0.1:8094/docs/api/
- http://127.0.0.1:8094/api/v1/checks/
- https://torsion.org/borgmatic/docs/how-to/monitor-your-backups/#healthchecks-hook
Implementation would require the following steps:
Write an Ansible lookup plugin which registers each cronjob automatically with healthchecks.io and returns the UUID to Ansible which uses that to add the correct curl command to each cronjob when rolling them out.
Required effort: 6-8 hours