Re: [lopsa-tech] Need ideas/suggestions for bringing several VMs back online after an outage

Mathew Snyder Tue, 29 Oct 2013 18:32:34 -0700

The only problem I see with the onerror=panic option is that we don't
necessarily want servers to reboot immediately. This would do us no good
when the network to the storage backend is down. We'd end up having to
reboot manually anyway in this scenario once the storage is back online and
each VM is able to boot to its respective disk.


-Mathew

"When you do things right, people won't be sure you've done anything at
all." - God; Futurama

"We'll get along much better once you accept that you're wrong and neither
am I." - Me


On Tue, Oct 29, 2013 at 6:08 PM, Jonathan <lo...@redigloo.org> wrote:

>  On Tue, Oct 29, 2013 at 8:27 PM, Mathew Snyder 
> <mathew.sny...@gmail.com>wrote:
>
>> I'm looking at information for the onerror=panic option. What happens
>> when I cause a kernel panic besides the system essentially becoming
>> inoperable? Does it automatically force a fsck on the next reboot? So far,
>> everything I've seen indicates that it simply creates a crash dump. That
>> really isn't all that useful in this situation as we know what causes the
>> problem.
>>
>      On 30/10/2013 00:36, Brandon Allbery wrote:
> fsck and normal shutdown both set a flag in the superblock indicating that
> the filesystem is clean; if this flag is not set then fsck is forced on
> reboot. Although, also important here, forcing a panic keeps the system
> from pointlessly trying to continue and behaving weirdly if the disks
> vanish out from under it.
>
>   As Brandon says, a panic stops the machine dead in its tracks, and
> causes it to reboot, with an fsck on the way up.  I've seen cases where a
> SAN gets heavily loaded (e.g. a boot storm) where running VMs will see SCSI
> timeouts and force file systems read-only, but ONLY on the active file
> systems.  For example, /var may become read-only whilst / remains
> writeable.  This gets ugly.  I've found systems which happily answer their
> Nagios probes, but some volume is essentially off-line.  Once a file system
> is read-only you are not going to be able to write any pending data to it,
> so you might as well crash.  If the SAN was just short-term overloaded, the
> system will likely come straight back up.  If the SAN is unavailable, the
> system will be unable to boot, but a downed host is easier to spot than one
> with a random volume in read-only mode.
>
> Jonathan.
>

_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-tech] Need ideas/suggestions for bringing several VMs back online after an outage

Reply via email to