Thanks Andrew. I upgraded corosync and pacemaker and the cluster works fine now.
On Thu, Jan 8, 2015 at 8:26 AM, Andrew Beekhof <and...@beekhof.net> wrote: > > > On 15 Dec 2014, at 4:29 pm, Bharathiraja P <r...@where2getit.com> wrote: > > > > Hi Andrew, > > > > Frequently one node gets disconnected from CIB and stops the cluster > resources. I'm not able to start or cleanup failed actions for any of the > resources. For ex, if nodeA gets disconnected from CIB, I won't be able to > run actions on a resource like cleanup/stop/restart,... as that hangs > forever. > > > > In corosync log I will see a message like this " cib: debug: > qb_ipcs_disconnect: qb_ipcs_disconnect(3760-5529- > > 13) state:2" > > > > All I had to do is to force kill the cib process on both nodes multiple > times. > > > > Let me know if you need any other info to nail down this issue. > > For starters, we'd need to know what process 5529 was and what the rest of > the processes in the cluster were doing. > Its impossible to say anything from so few non-error logs. > > > > > -- > > Bharathiraja > > > > On Mon, Dec 15, 2014 at 9:19 AM, Andrew Beekhof <and...@beekhof.net> > wrote: > > > > > On 12 Dec 2014, at 9:57 pm, Bharathiraja P <r...@where2getit.com> > wrote: > > > > > > Hi, > > > > > > We run pacemaker+corosync cluster on OpenSuSE 13.1 QEMU guests. > > > > > > Frequently, one node gets disconnected from cib. This is the message > seen in corosync logs, > > > > > > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: > qb_ipcs_dispatch_connection_request: HUP conn (3760-5529-13) > > > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: > qb_ipcs_disconnect: qb_ipcs_disconnect(3760-5529-13) state:2 > > > Nov 25 08:36:07 [3760] sysmon-secondary cib: info: > crm_client_destroy: Destroying 0 events > > > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: > qb_rb_close: Free'ing ringbuffer: > /dev/shm/qb-cib_ro-response-3760-5529-13-header > > > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: > qb_rb_close: Free'ing ringbuffer: > /dev/shm/qb-cib_ro-event-3760-5529-13-header > > > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: > qb_rb_close: Free'ing ringbuffer: > /dev/shm/qb-cib_ro-request-3760-5529-13-header > > > > > > > > > Can you pls help fix the issue? > > > > What issue? > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org