SpO₂ the little dynamic monitoring tool
SpO₂ is oxygen saturation and is used in medical person monitoring.
I am Kerollmops, I am the CTO of Meili and today I am releasing SpO₂.
At Meili, we needed a tool that allows us to monitor our pods, we already have vigil which health checks our front page and backend, but the number of these services is limited. We do not pop new front or backend servers dynamically (for now). When we create new search engines for the user we instanciate a kubernetes pod, we need to monitor the health of this service. Adding each of those URLs by hand in the vigil config file is not a solution.
So we decided that we needed a simple tool, a tool that can accept HTTP requests to register/unregister URLs to health check. We use the new async/await Rust syntax along with tide for the http server, no big deal here.
Our current cloud provider is Digital Ocean, therefore, we cannot host our SpO₂ service there. We chose Scaleway as it is way cheaper, and it works out of the box. We need persitent storage of the health checked URLs. What would you do if those are only stored in RAM? What if the server restarts? I already worked on a disk backed key-value store in Rust named Sled. So we chose to rely on it.
@qdequele built the front-end with vanilla javascript. Using Websocket we are able to display in realtime the pods status. As most humans do not stand in front of a TV the whole day, spotting color changes to notify the devops, we decided to implement notifications. We are using Slack for all of our monitoring tools, and because it is as simple as a webhook to implement, we went with it.
In the last release, we made some improvements to the Slack notification system. We now batch status changes events by 40; this means that SpO₂ sends one message with at most 40 events and regulates channel spamming. It also displays the HTTP status related to an unhealthy measurement and the error message on an unreachable one.
SpO₂ does not support SSL/TLS by itself, neither for the HTTP nor the WebSocket endpoints. We needed this kind of security so we looked at NGINX, a tiny little obscure reverse proxy server which we configured with basic authentication. It is not an easy task and because we are cool we made the documentation to help you do the same.
Do not hesitate to share or star this project, pull requests are welcome 😊
And just as a side note, we do not measure humans but machines actually.