On 2013-02-05T14:33:14, Ulrich Windl <[email protected]> wrote:
> I had an unexplainable failure of the stonith monitor for SBD. When examining
> the syslog, I got the impression that RA configuration data got corrupted,
> causing a RA failure.
Interesting. Please file a bug report.
And the easiest way is to just drop sbd_device from the configuration.
external/sbd will source /etc/sysconfig/sbd if no parameters are
specified and just work.
> I discovered more bad things: stonithd crashed:
> crmd: [9801]: info: process_lrm_event: LRM operation
> prm_stonith_sbd:1_monitor_180000 (call=89, status=1, cib-update=0,
> confirmed=true) Cancelled
> stonith-ng: [9797]: WARN: free_device: Removal of device 'prm_stonith_sbd:1'
> purged operation monitor
> kernel: [ 323.648355] show_signal_msg: 30 callbacks suppressed
> kernel: [ 323.648361] stonithd[9797]: segfault at 0 ip 00007f70528afb94 sp
> 00007fffaf06a410 error 4 in libcrmcommon.so.2.0.0[7f70528a4000+2d000]
> lrm-stonith: [14098]: ERROR: stonith_send_command: STONITH disconnected: 3
> lrm-stonith: [14098]: WARN: map_ra_retvalue: Mapped the invalid return code
> -10.
> lrmd: [9798]: info: operation stop[90] on prm_stonith_sbd:1 for client 9801:
> pid 14098 exited with return code 1
> crmd: [9801]: info: process_lrm_event: LRM operation prm_stonith_sbd:1_stop_0
> (call=90, rc=1, cib-update=145, confirmed=true) unknown error
> [...]
>
> It happened again (after another hard reset):
> kernel: [ 300.400783] stonithd[9798]: segfault at 0 ip 00007f8e32a18b94 sp
> 00007fffa5c954f0 error 4 in libcrmcommon.so.2.0.0[7f8e32a0d000+2d000]
Very, very much file a bug report including hb_report, which should
include a parsed coredump.
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems