On 2015-01-04T19:49:58, Oriol Mula-Valls <omv.li...@gmail.com> wrote:
> I have a two node system with SLES 11 SP3 (pacemaker-1.1.9-0.19.102, > corosync-1.4.5-0.18.15, sbd-1.1-0.13.153). Since desember we started to > have several reboots of the system due to SBD; 22nd, 24th and 26th. Last > reboot happened yesterday January 3rd. The message is the same all the > times. > /var/log/messages:Jan 3 11:55:08 kernighan sbd: [7879]: info: Cancelling > IO request due to timeout (rw=0) > /var/log/messages:Jan 3 11:55:08 kernighan sbd: [7879]: ERROR: mbox read > failed in servant. > /var/log/messages:Jan 3 11:55:08 kernighan sbd: [7878]: WARN: Servant for > /dev/sdc1 (pid: 7879) has terminated > /var/log/messages:Jan 3 11:55:08 kernighan sbd: [7878]: WARN: Servant for > /dev/sdc1 outdated (age: 4) > /var/log/messages:Jan 3 11:55:08 kernighan sbd: [8183]: info: Servant > starting for device /dev/sdc1 > /var/log/messages:Jan 3 11:55:11 kernighan sbd: [8183]: info: Cancelling > IO request due to timeout (rw=0) > /var/log/messages:Jan 3 11:55:11 kernighan sbd: [8183]: ERROR: Unable to > read header from device 5 > /var/log/messages:Jan 3 11:55:11 kernighan sbd: [8183]: ERROR: Not a valid > header on /dev/sdc1 > /var/log/messages:Jan 3 11:55:11 kernighan sbd: [7878]: WARN: Servant for > /dev/sdc1 (pid: 8183) has terminated > /var/log/messages:Jan 3 11:55:11 kernighan sbd: [7878]: WARN: Latency: No > liveness for 4 s exceeds threshold of 3 s (healthy servants: 0) > > The sbd is an iscsi drive shared by synology box. > > Could any one provide me some guidance on what's happenning please? Those are pretty clearly IO errors due to high latency. You may need to increase the IO timeout, and/or figure out why the IO to your Synology box sometimes stalls for multiple seconds. See the manpage for this; you can add the required flag to /etc/sysconfig/sbd -> SBD_OPTS. You also should use a stable name (/dev/disk/by-id/...) rather than /dev/sdc1 - note that /dev/sdX may not be stable over reboots or iSCSI restarts. Further, you can avoid the reboots by enabling the pacemaker integration. See the manpage for details on what that flag does. (-P) That will be the default in later sbd versions for releases after SLE HA 11. Regards, Lars -- Architect Storage/HA SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org