2014-03-17 16:37 GMT+09:00 Kazunori INOUE <kazunori.ino...@gmail.com>: > 2014-03-15 4:08 GMT+09:00 David Vossel <dvos...@redhat.com>: >> >> >> ----- Original Message ----- >>> From: "Kazunori INOUE" <kazunori.ino...@gmail.com> >>> To: "pm" <pacemaker@oss.clusterlabs.org> >>> Sent: Friday, March 14, 2014 5:52:38 AM >>> Subject: [Pacemaker] crmd was aborted at pacemaker 1.1.11 >>> >>> Hi, >>> >>> When specifying the node name in UPPER case and performing >>> crm_resource, crmd was aborted. >>> (The real node name is a LOWER case.) >> >> https://github.com/ClusterLabs/pacemaker/pull/462 >> >> does that fix it? >> > > Since behavior of glib is strange somehow, the result is NO. > I tested this brunch. > https://github.com/davidvossel/pacemaker/tree/lrm-segfault > * Red Hat Enterprise Linux Server release 6.4 (Santiago) > * glib2-2.22.5-7.el6.x86_64 > > strcase_equal() is not called from g_hash_table_lookup(). > > [x3650h ~]$ gdb /usr/libexec/pacemaker/crmd 17409 > ...snip... > (gdb) b lrm.c:1232 > Breakpoint 1 at 0x4251d0: file lrm.c, line 1232. > (gdb) b strcase_equal > Breakpoint 2 at 0x429828: file lrm_state.c, line 95. > (gdb) c > Continuing. > > Breakpoint 1, do_lrm_invoke (action=288230376151711744, > cause=C_IPC_MESSAGE, cur_state=S_NOT_DC, current_input=I_ROUTER, > msg_data=0x7fff8d679540) at lrm.c:1232 > 1232 lrm_state = lrm_state_find(target_node); > (gdb) s > lrm_state_find (node_name=0x1d4c650 "X3650H") at lrm_state.c:267 > 267 { > (gdb) n > 268 if (!node_name) { > (gdb) n > 271 return g_hash_table_lookup(lrm_state_table, node_name); > (gdb) p g_hash_table_size(lrm_state_table) > $1 = 1 > (gdb) p (char*)((GList*)g_hash_table_get_keys(lrm_state_table))->data > $2 = 0x1c791a0 "x3650h" > (gdb) p node_name > $3 = 0x1d4c650 "X3650H" > (gdb) n > 272 } > (gdb) n > do_lrm_invoke (action=288230376151711744, cause=C_IPC_MESSAGE, > cur_state=S_NOT_DC, current_input=I_ROUTER, msg_data=0x7fff8d679540) > at lrm.c:1234 > 1234 if (lrm_state == NULL && is_remote_node) { > (gdb) n > 1240 CRM_ASSERT(lrm_state != NULL); > (gdb) n > > Program received signal SIGABRT, Aborted. > 0x0000003787e328a5 in raise () from /lib64/libc.so.6 > (gdb) > > > I wonder why... so I will continue investigation. > >
I read the code of g_hash_table_lookup(). Key is compared by the hash value generated by crm_str_hash before strcase_equal() is performed. *** This is quick-fix solution. *** crmd/lrm_state.c | 4 ++-- include/crm/crm.h | 2 ++ lib/common/utils.c | 11 +++++++++++ 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/crmd/lrm_state.c b/crmd/lrm_state.c index d20d74a..ae036fd 100644 --- a/crmd/lrm_state.c +++ b/crmd/lrm_state.c @@ -234,13 +234,13 @@ lrm_state_init_local(void) } lrm_state_table = - g_hash_table_new_full(crm_str_hash, strcase_equal, NULL, internal_lrm_state_destroy); + g_hash_table_new_full(crm_str_hash2, strcase_equal, NULL, internal_lrm_state_destroy); if (!lrm_state_table) { return FALSE; } proxy_table = - g_hash_table_new_full(crm_str_hash, strcase_equal, NULL, remote_proxy_free); + g_hash_table_new_full(crm_str_hash2, strcase_equal, NULL, remote_proxy_free); if (!proxy_table) { g_hash_table_destroy(lrm_state_table); return FALSE; diff --git a/include/crm/crm.h b/include/crm/crm.h index b763cc0..46fe5df 100644 --- a/include/crm/crm.h +++ b/include/crm/crm.h @@ -195,7 +195,9 @@ typedef GList *GListPtr; # include <crm/error.h> # define crm_str_hash g_str_hash_traditional +# define crm_str_hash2 g_str_hash_traditional2 guint g_str_hash_traditional(gconstpointer v); +guint g_str_hash_traditional2(gconstpointer v); #endif diff --git a/lib/common/utils.c b/lib/common/utils.c index 29d7965..50fa6c0 100644 --- a/lib/common/utils.c +++ b/lib/common/utils.c @@ -2368,6 +2368,17 @@ g_str_hash_traditional(gconstpointer v) return h; } +guint +g_str_hash_traditional2(gconstpointer v) +{ + const signed char *p; + guint32 h = 0; + + for (p = v; *p != '\0'; p++) + h = (h << 5) - h + g_ascii_tolower(*p); + + return h; +} void * find_library_function(void **handle, const char *lib, const char *fn, gboolean fatal) >>> # crm_resource -C -r p1 -N X3650H >>> Cleaning up p1 on X3650H >>> Waiting for 1 replies from the CRMdNo messages received in 60 seconds.. >>> aborting >>> >>> Mar 14 18:33:10 x3650h crmd[10718]: error: crm_abort: >>> do_lrm_invoke: Triggered fatal assert at lrm.c:1240 : lrm_state != >>> NULL >>> ...snip... >>> Mar 14 18:33:10 x3650h pacemakerd[10708]: error: child_waitpid: >>> Managed process 10718 (crmd) dumped core >>> >>> >>> * The state before performing crm_resource. >>> ---- >>> Stack: corosync >>> Current DC: x3650g (3232261383) - partition with quorum >>> Version: 1.1.10-38c5972 >>> 2 Nodes configured >>> 3 Resources configured >>> >>> >>> Online: [ x3650g x3650h ] >>> >>> Full list of resources: >>> >>> f-g (stonith:external/ibmrsa-telnet): Started x3650h >>> f-h (stonith:external/ibmrsa-telnet): Started x3650g >>> p1 (ocf::pacemaker:Dummy): Stopped >>> >>> Migration summary: >>> * Node x3650g: >>> * Node x3650h: >>> p1: migration-threshold=1 fail-count=1 last-failure='Fri Mar 14 >>> 18:32:48 2014' >>> >>> Failed actions: >>> p1_monitor_10000 on x3650h 'not running' (7): call=16, >>> status=complete, last-rc-change='Fri Mar 14 18:32:48 2014', >>> queued=0ms, exec=0ms >>> ---- >>> >>> Just for reference, similar phenomenon did not occur by crm_standby. >>> $ crm_standby -U X3650H -v on >>> >>> >>> Best Regards, >>> Kazunori INOUE >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org