Re: System monitoring

Pjotr Prins Sun, 29 Dec 2019 14:18:50 -0800

On Sun, Dec 29, 2019 at 09:05:40PM +0100, Nicolò Balzarotti wrote:
> I think zabbix should work, but I've never used it.  On the surface, it
> seems to have a steep learning curve, but this is just my impression.


The problem with these systems is that they target (complex)
deployments that have people watching these systems. 

What I need is much simpler - I don't want to watch systems, but I
need a cursory idea of health of say 20-40 machines out there. I also
want something that can notify me if things go really wrong. For
example when backups fail. These are not massive requirements - just
something flexible! I used to have scripts for that that would
mail/text me. But that was all a bit ad hoc and I got tired of
maintaining them and I got tired of repeating notifications ;)

What would be really cool is to be able to use logic programming. It
would allow questions like:

  What services showed interruptions in the last month on low RAM
  machines that also ran guix < 1.0 and a specific version of nginx.

This would mean storing state of machines in a database that gets
updated by messages. It means a good message broker. It means that
every time you write a monitoring service, you'll have to write a
receiver to turn it into a datastructure something like miniKanren can
solve. Key is to make *creating* such small reporter/receiver tools
really easy.

Visualisations are less important - though I am sure some people enjoy
creating those.

I.e., what I have in mind is a different type of systems monitor: a
minimalistic system that is hackable and can work out of the box for
Guix systems and are really easy to extend.

I think if we can prototype something in the coming months it would
make a great GSoC project to build out functionality.

Pj.

Re: System monitoring

Reply via email to