Thanks a lot Lars. I took advantage of a crash last week to add the -P parameter.
I'll try to read more carefully the man of sbd to increase the IO timeout. Kind regards, Oriol On Wed, Jan 7, 2015 at 12:09 PM, Lars Marowsky-Bree <l...@suse.com> wrote: > On 2015-01-04T19:49:58, Oriol Mula-Valls <omv.li...@gmail.com> wrote: > > > I have a two node system with SLES 11 SP3 (pacemaker-1.1.9-0.19.102, > > corosync-1.4.5-0.18.15, sbd-1.1-0.13.153). Since desember we started to > > have several reboots of the system due to SBD; 22nd, 24th and 26th. Last > > reboot happened yesterday January 3rd. The message is the same all the > > times. > > /var/log/messages:Jan 3 11:55:08 kernighan sbd: [7879]: info: Cancelling > > IO request due to timeout (rw=0) > > /var/log/messages:Jan 3 11:55:08 kernighan sbd: [7879]: ERROR: mbox read > > failed in servant. > > /var/log/messages:Jan 3 11:55:08 kernighan sbd: [7878]: WARN: Servant > for > > /dev/sdc1 (pid: 7879) has terminated > > /var/log/messages:Jan 3 11:55:08 kernighan sbd: [7878]: WARN: Servant > for > > /dev/sdc1 outdated (age: 4) > > /var/log/messages:Jan 3 11:55:08 kernighan sbd: [8183]: info: Servant > > starting for device /dev/sdc1 > > /var/log/messages:Jan 3 11:55:11 kernighan sbd: [8183]: info: Cancelling > > IO request due to timeout (rw=0) > > /var/log/messages:Jan 3 11:55:11 kernighan sbd: [8183]: ERROR: Unable to > > read header from device 5 > > /var/log/messages:Jan 3 11:55:11 kernighan sbd: [8183]: ERROR: Not a > valid > > header on /dev/sdc1 > > /var/log/messages:Jan 3 11:55:11 kernighan sbd: [7878]: WARN: Servant > for > > /dev/sdc1 (pid: 8183) has terminated > > /var/log/messages:Jan 3 11:55:11 kernighan sbd: [7878]: WARN: Latency: > No > > liveness for 4 s exceeds threshold of 3 s (healthy servants: 0) > > > > The sbd is an iscsi drive shared by synology box. > > > > Could any one provide me some guidance on what's happenning please? > > Those are pretty clearly IO errors due to high latency. You may need to > increase the IO timeout, and/or figure out why the IO to your Synology > box sometimes stalls for multiple seconds. See the manpage for this; you > can add the required flag to /etc/sysconfig/sbd -> SBD_OPTS. > > You also should use a stable name (/dev/disk/by-id/...) rather than > /dev/sdc1 - note that /dev/sdX may not be stable over reboots or iSCSI > restarts. > > Further, you can avoid the reboots by enabling the pacemaker > integration. See the manpage for details on what that flag does. (-P) > That will be the default in later sbd versions for releases after SLE HA > 11. > > > > Regards, > Lars > > -- > Architect Storage/HA > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, > Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org