On Fri, May 17, 2019 at 6:28 AM Mick <michaelkintz...@gmail.com> wrote:
>
> Count yourself lucky.  You could have discovered your disk wouldn't spin up
> again, your PSU packed up, or even the MoBo chipset decided to retire from
> active service.  Eventually, any of these hardware problems would manifest
> themselves, but a reboot could reveal their demise sooner and hopefully at a
> point where you were somewhat prepared for it.
>

++

You can't completely prevent reboots (not unless you are willing to
spend big and go mainframe or something like that - and those create a
different set of issues).  What you can do is take steps to reduce the
risk that an unplanned reboot will cause problems.

One of the best ways to ensure you're prepared for disaster is to make
disaster routine.  Regular reboots can be a part of this, because you
can do them at a time when you have time to deal with problems, and
when you're looking for problems.

This is why I've made the move to containers largely.  I still have a
few processes running on my host because, but almost everything has
moved into containers that do one thing.  When I update a container I
take a snapshot, run updates, shut it down, take another snapshot,
start it up, and test the service it runs.  Since each container only
does one thing, I know exactly what to test.  If it works I'm good,
and if it doesn't work I can roll it back and not worry about what
that might break on the 47 other services running on the same host.
Every update involves an effective reboot for that one service, so I
know that in the event of a host reboot they will generally all come
up fine.  I of course update the host regularly and reboot that for
kernel updates, which seem to come about twice a week these days
anyway.

Obviously I don't run updates the day before I leave on vacation,
unless they are security critical, and then I exercise some care.

The downside is that I end up with a lot more hosts to keep up to
date, because I can't just run emerge -u world once on one host and
have every service I run updated.  However, I gladly accept the extra
work because the work itself becomes much simpler and predictable.  If
I'm updating my imapd container and imapd still works, then I'm fine.
I don't have to worry about suddenly realizing two days later that
postgrey is bouncing a ton of mail or whatever.  If something obscure
like a text editor breaks in my imapd container which I didn't catch,
that might be an annoyance but it doesn't really impact me much since
it isn't critical for the operation of that container.

-- 
Rich

Reply via email to