Re: [Pacemaker] crmd does abort if a stopped node is specified

Kazunori INOUE Wed, 23 Apr 2014 20:42:07 -0700

2014-04-23 19:32 GMT+09:00 Andrew Beekhof <and...@beekhof.net>:
>
> On 23 Apr 2014, at 7:17 pm, Kazunori INOUE <kazunori.ino...@gmail.com> wrote:
>
>> 2014-04-22 0:45 GMT+09:00 David Vossel <dvos...@redhat.com>:
>>>
>>> ----- Original Message -----
>>>> From: "Kazunori INOUE" <kazunori.ino...@gmail.com>
>>>> To: "pm" <pacemaker@oss.clusterlabs.org>
>>>> Sent: Friday, April 18, 2014 4:49:42 AM
>>>> Subject: [Pacemaker] crmd does abort if a stopped node is specified
>>>>
>>>> Hi,
>>>>
>>>> crmd does abort if I load CIB which specified a stopped node.
>>>>
>>>> # crm_mon -1
>>>> Last updated: Fri Apr 18 11:51:36 2014
>>>> Last change: Fri Apr 18 11:51:30 2014
>>>> Stack: corosync
>>>> Current DC: pm103 (3232261519) - partition WITHOUT quorum
>>>> Version: 1.1.11-cf82673
>>>> 1 Nodes configured
>>>> 0 Resources configured
>>>>
>>>> Online: [ pm103 ]
>>>>
>>>> # cat test.cli
>>>> node pm103
>>>> node pm104
>>>>
>>>> # crm configure load update test.cli
>>>>
>>>> Apr 18 11:52:42 pm103 crmd[11672]:    error: crm_int_helper:
>>>> Characters left over after parsing 'pm104': 'pm104'
>>>> Apr 18 11:52:42 pm103 crmd[11672]:    error: crm_abort: crm_get_peer:
>>>> Triggered fatal assert at membership.c:420 : id > 0 || uname != NULL
>>>> Apr 18 11:52:42 pm103 pacemakerd[11663]:    error: child_waitpid:
>>>> Managed process 11672 (crmd) dumped core
>>>>
>>>> (gdb) bt
>>>> #0  0x00000033da432925 in raise () from /lib64/libc.so.6
>>>> #1  0x00000033da434105 in abort () from /lib64/libc.so.6
>>>> #2  0x00007f30241b7027 in crm_abort (file=0x7f302440b0b3
>>>> "membership.c", function=0x7f302440b5d0 "crm_get_peer", line=420,
>>>> assert_condition=0x7f302440b27e "id > 0 || uname != NULL", do_core=1,
>>>> do_fork=0) at utils.c:1177
>>>> #3  0x00007f30244048ee in crm_get_peer (id=0, uname=0x0) at 
>>>> membership.c:420
>>>> #4  0x00007f3024402238 in crm_peer_uname (uuid=0x113e7c0 "pm104") at
>>>
>>> is the uuid for your cluster nodes supposed to be the same as the uname?  
>>> We're treating the uuid in this situation as if it should be a number, 
>>> which it clearly is not.
>>
>> OK, I got it.
>>
>> By the way, is there a method to know id of the node before starting 
>> pacemaker?
>
> Normally it comes from corosync, so not really :-(


It seems the only way is to specify the nodeid to nodelist directive
in corosync.conf.

nodelist {
  node {
    ring0_addr: 192.168.101.143
    nodeid: 3
  }
  node {
    ring0_addr: 192.168.101.144
    nodeid: 4
  }
}

Thanks!

>
>>
>>>
>>> -- Vossel
>>>
>>>
>>>> cluster.c:386
>>>> #5  0x000000000043afbd in abort_transition_graph
>>>> (abort_priority=1000000, abort_action=tg_restart, abort_text=0x44d2f4
>>>> "Non-status change", reason=0x113e4b0, fn=0x44df07 "te_update_diff",
>>>> line=382) at te_utils.c:518
>>>> #6  0x000000000043caa4 in te_update_diff (event=0x10f2240
>>>> "cib_diff_notify", msg=0x1137660) at te_callbacks.c:382
>>>> #7  0x00007f302461d1bc in cib_native_notify (data=0x10ef750,
>>>> user_data=0x1137660) at cib_utils.c:733
>>>> #8  0x00000033db83d6bc in g_list_foreach () from /lib64/libglib-2.0.so.0
>>>> #9  0x00007f3024620191 in cib_native_dispatch_internal
>>>> (buffer=0xe61ea8 "<notify t=\"cib_notify\" subt=\"cib_diff_notify\"
>>>> cib_op=\"cib_apply_diff\" cib_rc=\"0\"
>>>> cib_object_type=\"diff\"><cib_generation><generation_tuple epoch=\"4\"
>>>> num_updates=\"0\" admin_epoch=\"0\" validate-with=\"pacem"...,
>>>> length=1708, userdata=0xe5eb90) at cib_native.c:123
>>>> #10 0x00007f30241dee72 in mainloop_gio_callback (gio=0xf61ea0,
>>>> condition=G_IO_IN, data=0xe601b0) at mainloop.c:639
>>>> #11 0x00000033db83feb2 in g_main_context_dispatch () from
>>>> /lib64/libglib-2.0.so.0
>>>> #12 0x00000033db843d68 in ?? () from /lib64/libglib-2.0.so.0
>>>> #13 0x00000033db844275 in g_main_loop_run () from /lib64/libglib-2.0.so.0
>>>> #14 0x0000000000406469 in crmd_init () at main.c:154
>>>> #15 0x00000000004062b0 in main (argc=1, argv=0x7fff908829f8) at main.c:121
>>>>
>>>> Is this all right?
>>>>
>>>> Best Regards,
>>>> Kazunori INOUE
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crmd does abort if a stopped node is specified

Reply via email to