The stack trace makes it look like a logging deadlock. I'll ask the openais maintainer about it.
On Fri, Aug 21, 2009 at 5:11 PM, Diego Remolina<diego.remol...@physics.gatech.edu> wrote: > Here is what I am seeing now right after stopping openais, updating > heartbeat and pacemaker and trying to start openais again: > > [r...@phys-file02 ~]# /etc/init.d/openais status > Stopped > [r...@phys-file02 ~]# /etc/init.d/openais start > Starting OpenAIS daemon (aisexec): starting... rc=0: OK > [r...@phys-file02 ~]# crm status > > Connection to cluster failed: connection failed > [r...@phys-file02 ~]# crm status > > Connection to cluster failed: connection failed > [r...@phys-file02 ~]# crm status > > Connection to cluster failed: connection failed > [r...@phys-file02 ~]# yum -y install gdb > > At this point, I installed gdb and here is what I get: > > [r...@phys-file02 ~]# ps -ef | grep aisexec > root 19423 1 0 11:01 pts/1 00:00:00 aisexec > root 19520 19241 0 11:02 pts/1 00:00:00 grep aisexec > [r...@phys-file02 ~]# gdb aisexec 19423 > GNU gdb Fedora (6.8-27.el5) > Copyright (C) 2008 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu"... > (no debugging symbols found) > Attaching to program: /usr/sbin/aisexec, process 19423 > Reading symbols from /lib64/libdl.so.2...(no debugging symbols > found)...done. > Loaded symbols for /lib64/libdl.so.2 > Reading symbols from /lib64/libpthread.so.0...(no debugging symbols > found)...done. > [Thread debugging using libthread_db enabled] > [New Thread 0x2ae946b8fec0 (LWP 19423)] > [New Thread 0x40638fe0 (LWP 19425)] > Loaded symbols for /lib64/libpthread.so.0 > Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. > Loaded symbols for /lib64/libc.so.6 > Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols > found)...done. > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > Reading symbols from /usr/libexec/lcrso/objdb.lcrso...done. > Loaded symbols for /usr/libexec/lcrso/objdb.lcrso > Reading symbols from /usr/libexec/lcrso/aisparser.lcrso...done. > Loaded symbols for /usr/libexec/lcrso/aisparser.lcrso > Reading symbols from /usr/libexec/lcrso/pacemaker.lcrso...done. > Loaded symbols for /usr/libexec/lcrso/pacemaker.lcrso > Reading symbols from /usr/lib64/libplumb.so.2...done. > Loaded symbols for /usr/lib64/libplumb.so.2 > Reading symbols from /usr/lib64/libpils.so.2...done. > Loaded symbols for /usr/lib64/libpils.so.2 > Reading symbols from /usr/lib64/libbz2.so.1...done. > Loaded symbols for /usr/lib64/libbz2.so.1 > Reading symbols from /usr/lib64/libxslt.so.1...done. > Loaded symbols for /usr/lib64/libxslt.so.1 > Reading symbols from /usr/lib64/libxml2.so.2...done. > Loaded symbols for /usr/lib64/libxml2.so.2 > Reading symbols from /lib64/libuuid.so.1...done. > Loaded symbols for /lib64/libuuid.so.1 > Reading symbols from /lib64/libpam.so.0...done. > Loaded symbols for /lib64/libpam.so.0 > Reading symbols from /lib64/librt.so.1...done. > Loaded symbols for /lib64/librt.so.1 > Reading symbols from /lib64/libglib-2.0.so.0...done. > Loaded symbols for /lib64/libglib-2.0.so.0 > Reading symbols from /usr/lib64/libltdl.so.3...done. > Loaded symbols for /usr/lib64/libltdl.so.3 > Reading symbols from /usr/lib64/libz.so.1...done. > Loaded symbols for /usr/lib64/libz.so.1 > Reading symbols from /lib64/libm.so.6...done. > Loaded symbols for /lib64/libm.so.6 > Reading symbols from /lib64/libaudit.so.0...done. > Loaded symbols for /lib64/libaudit.so.0 > Reading symbols from /lib64/libgcc_s.so.1...done. > Loaded symbols for /lib64/libgcc_s.so.1 > 0x0000003be08dee5e in __lll_lock_wait_private () from /lib64/libc.so.6 > (gdb) where > #0 0x0000003be08dee5e in __lll_lock_wait_private () from /lib64/libc.so.6 > #1 0x0000003be088c74d in _L_lock_1685 () from /lib64/libc.so.6 > #2 0x0000003be088c497 in __tz_convert () from /lib64/libc.so.6 > #3 0x0000000000418a16 in _log_printf () > #4 0x0000000000418cb1 in internal_log_printf2 () > #5 0x00002aaaab0b8819 in pcmk_plugin_init () from > /usr/libexec/lcrso/pacemaker.lcrso > #6 0x00002aaaab0b946a in pcmk_startup () from > /usr/libexec/lcrso/pacemaker.lcrso > #7 0x000000000041a422 in openais_service_link_and_init () > #8 0x000000000041a5c8 in openais_service_defaults_link_and_init () > #9 0x0000000000418117 in main () > (gdb) thread 0 > Thread ID 0 not known. > (gdb) thread 1 > [Switching to thread 1 (Thread 0x2ae946b8fec0 (LWP 19423))]#0 > 0x0000003be08dee5e in __lll_lock_wait_private () from /lib64/libc.so.6 > (gdb) where > #0 0x0000003be08dee5e in __lll_lock_wait_private () from /lib64/libc.so.6 > #1 0x0000003be088c74d in _L_lock_1685 () from /lib64/libc.so.6 > #2 0x0000003be088c497 in __tz_convert () from /lib64/libc.so.6 > #3 0x0000000000418a16 in _log_printf () > #4 0x0000000000418cb1 in internal_log_printf2 () > #5 0x00002aaaab0b8819 in pcmk_plugin_init () from > /usr/libexec/lcrso/pacemaker.lcrso > #6 0x00002aaaab0b946a in pcmk_startup () from > /usr/libexec/lcrso/pacemaker.lcrso > #7 0x000000000041a422 in openais_service_link_and_init () > #8 0x000000000041a5c8 in openais_service_defaults_link_and_init () > #9 0x0000000000418117 in main () > (gdb) thread 3 > Thread ID 3 not known. > (gdb) thread 4 > Thread ID 4 not known. > (gdb) thread 5 > Thread ID 5 not known. > (gdb) thread 6 > Thread ID 6 not known. > > Like I said, I have not used gdb before, so if I am doing something wrong, > let me know what I should do or where can I read some docs to try and > understand what I am supposed to do with it to give you useful output. > > Here is the log file where I do not see any valuable crm info up until the > point where I installed gdb on the system > > Aug 21 11:01:31 phys-file02 openais[19423]: [MAIN ] AIS Executive Service > RELEASE 'subrev 1152 version 0.80' > Aug 21 11:01:31 phys-file02 openais[19423]: [MAIN ] Copyright (C) 2002-2006 > MontaVista Software, Inc and contributors. > Aug 21 11:01:31 phys-file02 openais[19423]: [MAIN ] Copyright (C) 2006 Red > Hat, Inc. > Aug 21 11:01:31 phys-file02 openais[19423]: [MAIN ] AIS Executive Service: > started and ready to provide service. > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Token Timeout (3000 ms) > retransmit timeout (294 ms) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] token hold (225 ms) > retransmits before loss (10 retrans) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] join (60 ms) send_join > (0 ms) consensus (1500 ms) merge (200 ms) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] downcheck (1000 ms) fail > to recv const (50 msgs) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] seqno unchanged const > (30 rotations) Maximum network MTU 1500 > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] window size per rotation > (50 messages) maximum messages per rotation (20 messages) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] send threads (0 threads) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] RRP token expired > timeout (294 ms) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] RRP token problem > counter (2000 ms) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] RRP threshold (10 > problem count) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] RRP mode set to passive. > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] > heartbeat_failures_allowed (0) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] max_network_delay (50 > ms) > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] HeartBeat is Disabled. > To enable set heartbeat_failures_allowed > 0 > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Receive multicast socket > recv buffer size (262142 bytes). > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Transmit multicast > socket send buffer size (262142 bytes). > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] The network interface > [10.0.0.22] is now up. > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Created or loaded > sequence id 184.10.0.0.22 for this ring. > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Receive multicast socket > recv buffer size (262142 bytes). > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] Transmit multicast > socket send buffer size (262142 bytes). > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] The network interface > [10.0.1.22] is now up. > Aug 21 11:01:32 phys-file02 openais[19423]: [TOTEM] entering GATHER state > from 15. > Aug 21 11:01:32 phys-file02 openais[19423]: [crm ] info: process_ais_conf: > Reading configure > Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: config_find_next: > Processing additional logging options... > Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: get_config_opt: > Found 'on' for option: debug > Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: get_config_opt: > Defaulting to 'off' for option: to_file > Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: get_config_opt: > Found 'daemon' for option: syslog_facility > Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: config_find_next: > Processing additional service options... > Aug 21 11:01:32 phys-file02 openais[19423]: [MAIN ] info: get_config_opt: > Defaulting to 'no' for option: use_logd > Aug 21 11:01:58 phys-file02 crm_shadow: [19439]: info: Invoked: crm_shadow > Aug 21 11:01:58 phys-file02 crm_shadow: [19453]: info: Invoked: crm_shadow > Aug 21 11:01:58 phys-file02 crm_shadow: [19455]: info: Invoked: crm_shadow > Aug 21 11:02:01 phys-file02 crm_shadow: [19467]: info: Invoked: crm_shadow > Aug 21 11:02:01 phys-file02 crm_shadow: [19481]: info: Invoked: crm_shadow > Aug 21 11:02:01 phys-file02 crm_shadow: [19483]: info: Invoked: crm_shadow > Aug 21 11:02:03 phys-file02 crm_shadow: [19495]: info: Invoked: crm_shadow > Aug 21 11:02:03 phys-file02 crm_shadow: [19509]: info: Invoked: crm_shadow > Aug 21 11:02:03 phys-file02 crm_shadow: [19511]: info: Invoked: crm_shadow > Aug 21 11:02:16 phys-file02 yum: Installed: gdb-6.8-27.el5.x86_64 > > Again, killin aisexec and restarting openais seems to work. > > [r...@phys-file02 ~]# /etc/init.d/openais stop > Stopping OpenAIS daemon (aisexec): > ...................................................................................................................................... > [r...@phys-file02 ~]# pkill -9 aisexec > [r...@phys-file02 ~]# ps -ef | grep aise > root 19546 19241 0 11:10 pts/1 00:00:00 grep aise > [r...@phys-file02 ~]# /etc/init.d/openais start > Starting OpenAIS daemon (aisexec): starting... rc=0: OK > [r...@phys-file02 ~]# crm status > > > ============ > Last updated: Fri Aug 21 11:10:51 2009 > Stack: openais > Current DC: phys-file01.physics.gatech.edu - partition with quorum > Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 > 2 Nodes configured, 2 expected votes > 4 Resources configured. > ============ > > Online: [ phys-file01.physics.gatech.edu phys-file02.physics.gatech.edu ] > > Master/Slave Set: ms-drbd_export > Masters: [ phys-file01.physics.gatech.edu ] > Slaves: [ phys-file02.physics.gatech.edu ] > Master/Slave Set: ms-drbd_scratch > Masters: [ phys-file01.physics.gatech.edu ] > Slaves: [ phys-file02.physics.gatech.edu ] > Resource Group: fileserver > fs_export (ocf::heartbeat:Filesystem): Started > phys-file01.physics.gatech.edu > fs_scratch (ocf::heartbeat:Filesystem): Started > phys-file01.physics.gatech.edu > virtual-ip-1 (ocf::heartbeat:IPaddr2): Started > phys-file01.physics.gatech.edu > nfs (lsb:nfs): Started phys-file01.physics.gatech.edu > samba (lsb:smb): Started phys-file01.physics.gatech.edu > Clone Set: pingd-clone > Started: [ phys-file01.physics.gatech.edu > phys-file02.physics.gatech.edu ] > [r...@phys-file02 ~]# > > Diego > > Andrew Beekhof wrote: >> >> On Wed, Aug 12, 2009 at 3:35 PM, Diego >> Remolina<diego.remol...@physics.gatech.edu> wrote: >>>> >>>> could you instead attach to it with gdb and see what it was doing? >>> >>> I will try, but cannot promise it will be soon, beginning of the semester >>> is >>> very busy and I am not familiar with gdb... >> >> gdb aisexec $PID_OF_AISEXEC >> # where >> >> then, for every thread it has: >> >> # thread 0 >> # where >> # thread 1 >> # where >> ... >> >> I think you get the idea :-) >> >>> RedHat.... one is x86_64, the other is the 32 bit one.... >>> >>> [r...@phys-file01 windows7]# rpm -qa --qf >>> "%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n" | grep openais >>> openais-0.80.5-13.1.x86_64 >>> libopenais2-0.80.5-13.1.i386 >>> libopenais2-0.80.5-13.1.x86_64 >> >> how about trying with just one? >> maybe something is confused. >> >> _______________________________________________ >> Pacemaker mailing list >> Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker