Hi All, Our user operated cibadmin command by mistake. By an operation error, reboot of crmd occurs.
Step 1) Start a cluster. [root@rh70-node1 ~]# crm_mon -1 -Af Last updated: Wed Nov 5 10:26:51 2014 Last change: Wed Nov 5 10:23:39 2014 Stack: corosync Current DC: rh70-node1 (3232238160) - partition WITHOUT quorum Version: 1.1.12-85c093e 1 Nodes configured 0 Resources configured Online: [ rh70-node1 ] Node Attributes: * Node rh70-node1: Migration summary: * Node rh70-node1: Step 2) A user adds a node by wrong designation. cibadmin -C -o nodes -X '<node id="hpg604" type="normal" uname="hpg604"/>' The crmd core-dump and reboots. ---------------------------- Nov 5 10:28:17 rh70-node1 cib[2167]: info: cib_process_request: Forwarding cib_create operation for section nodes to master (origin=local/cibadmin/2) Nov 5 10:28:17 rh70-node1 cib[2167]: info: cib_perform_op: Diff: --- 0.2.7 2 Nov 5 10:28:17 rh70-node1 cib[2167]: info: cib_perform_op: Diff: +++ 0.3.0 92153f86c58ed569196d946612f0dab8 Nov 5 10:28:17 rh70-node1 cib[2167]: info: cib_perform_op: + /cib: @epoch=3, @num_updates=0 Nov 5 10:28:17 rh70-node1 cib[2167]: info: cib_perform_op: ++ /cib/configuration/nodes: <node id="hpg604" type="normal" uname="hpg604"/> Nov 5 10:28:17 rh70-node1 cib[2167]: info: cib_process_request: Completed cib_create operation for section nodes: OK (rc=0, origin=rh70-node1/cibadmin/2, version=0.3.0) Nov 5 10:28:17 rh70-node1 crmd[2172]: error: crm_int_helper: Characters left over after parsing 'hpg604': 'hpg604' Nov 5 10:28:17 rh70-node1 crmd[2172]: error: crm_abort: crm_find_peer: Triggered fatal assert at membership.c:338 : id > 0 || uname != NULL Nov 5 10:28:17 rh70-node1 cib[2223]: info: write_cib_contents: Archived previous version as /var/lib/pacemaker/cib/cib-2.raw Nov 5 10:28:17 rh70-node1 cib[2223]: info: write_cib_contents: Wrote version 0.3.0 of the CIB to disk (digest: fd92fe00a0f0478246b1c9f1d2be83a8) Nov 5 10:28:17 rh70-node1 cib[2223]: info: retrieveCib: Reading cluster configuration from: /var/lib/pacemaker/cib/cib.CARj72 (digest: /var/lib/pacemaker/cib/cib.XK4ybJ) Nov 5 10:28:17 rh70-node1 abrt-hook-ccpp: Saved core dump of pid 2172 (/usr/libexec/pacemaker/crmd) to /var/tmp/abrt/ccpp-2014-11-05-10:28:17-2172 (18141184 bytes) Nov 5 10:28:18 rh70-node1 abrt-server: Executable '/usr/libexec/pacemaker/crmd' doesn't belong to any package and ProcessUnpackaged is set to 'no' Nov 5 10:28:18 rh70-node1 abrt-server: 'post-create' on '/var/tmp/abrt/ccpp-2014-11-05-10:28:17-2172' exited with 1 Nov 5 10:28:18 rh70-node1 abrt-server: Deleting problem directory '/var/tmp/abrt/ccpp-2014-11-05-10:28:17-2172' Nov 5 10:28:18 rh70-node1 pacemakerd[2166]: error: child_waitpid: Managed process 2172 (crmd) dumped core Nov 5 10:28:18 rh70-node1 pacemakerd[2166]: error: pcmk_child_exit: The crmd process (2172) terminated with signal 6 (core=1) Nov 5 10:28:18 rh70-node1 pacemakerd[2166]: notice: pcmk_process_exit: Respawning failed child process: crmd Nov 5 10:28:18 rh70-node1 pacemakerd[2166]: info: start_child: Using uid=992 and group=990 for process crmd Nov 5 10:28:18 rh70-node1 pacemakerd[2166]: info: start_child: Forked child 2228 for process crmd Nov 5 10:28:18 rh70-node1 crmd[2228]: info: crm_log_init: Changed active directory to /usr/var/lib/heartbeat/cores/hacluster Nov 5 10:28:18 rh70-node1 crmd[2228]: notice: main: CRM Git Version: 85c093e Nov 5 10:28:18 rh70-node1 crmd[2228]: info: do_log: FSA: Input I_STARTUP from crmd_init() received in state S_STARTING Nov 5 10:28:18 rh70-node1 crmd[2228]: info: get_cluster_type: Verifying cluster type: 'corosync' ---------------------------- It is an operation error of the user, but it is not desirable for crmd to reboot. We request the improvement that crmd does not reboot. Best Regards, Hideo Yamauchi. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org