On Sun, Dec 29, 2019 at 09:05:40PM +0100, Nicolò Balzarotti wrote: > I think zabbix should work, but I've never used it. On the surface, it > seems to have a steep learning curve, but this is just my impression.
The problem with these systems is that they target (complex) deployments that have people watching these systems. What I need is much simpler - I don't want to watch systems, but I need a cursory idea of health of say 20-40 machines out there. I also want something that can notify me if things go really wrong. For example when backups fail. These are not massive requirements - just something flexible! I used to have scripts for that that would mail/text me. But that was all a bit ad hoc and I got tired of maintaining them and I got tired of repeating notifications ;) What would be really cool is to be able to use logic programming. It would allow questions like: What services showed interruptions in the last month on low RAM machines that also ran guix < 1.0 and a specific version of nginx. This would mean storing state of machines in a database that gets updated by messages. It means a good message broker. It means that every time you write a monitoring service, you'll have to write a receiver to turn it into a datastructure something like miniKanren can solve. Key is to make *creating* such small reporter/receiver tools really easy. Visualisations are less important - though I am sure some people enjoy creating those. I.e., what I have in mind is a different type of systems monitor: a minimalistic system that is hackable and can work out of the box for Guix systems and are really easy to extend. I think if we can prototype something in the coming months it would make a great GSoC project to build out functionality. Pj.