Can you define "not correctly" please?
I'd rather not ignore such behavior.

The machine would come up and not join the cluster. Checking the status of openais would show as "Running". crm status would show:

Connection to cluster failed: connection failed

A look at the log file shows:

Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] AIS Executive Service RELEASE 'subrev 1152 version 0.80' Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors. Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] Copyright (C) 2006 Red Hat, Inc. Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] AIS Executive Service: started and ready to provide service. Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Token Timeout (3000 ms) retransmit timeout (294 ms) Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] token hold (225 ms) retransmits before loss (10 retrans) Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] join (60 ms) send_join (0 ms) consensus (1500 ms) merge (200 ms) Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs) Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500 Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (20 messages)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] send threads (0 threads)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP token expired timeout (294 ms) Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP token problem counter (2000 ms) Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP threshold (10 problem count)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP mode set to passive.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] heartbeat_failures_allowed (0)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] max_network_delay (50 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0 Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] The network interface [10.0.0.22] is now up. Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Created or loaded sequence id 112.10.0.0.22 for this ring. Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] The network interface [10.0.1.22] is now up. Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] entering GATHER state from 15. Aug 12 07:57:17 phys-file02 openais[9380]: [crm ] info: process_ais_conf: Reading configure Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: config_find_next: Processing additional logging options... Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: get_config_opt: Found 'on' for option: debug Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: get_config_opt: Defaulting to 'off' for option: to_file
Aug 12 07:57:21 phys-file02 crm_shadow: [9396]: info: Invoked: crm_shadow

I try to stop ais but it fails, the dots just keep appearing on the stop command progress:

[r...@phys-file02 log]# /etc/init.d/openais stop
Stopping OpenAIS daemon (aisexec): ..............................................

I have to Ctrl+C out of it and then

[r...@phys-file02 log]# pkill -9 aisexec
[r...@phys-file02 log]# ps -ef | grep ais
root      9639  5760  0 08:01 pts/1    00:00:00 grep ais

Then I start openais again and crm starts correctly.

[r...@phys-file02 log]# /etc/init.d/openais start
Starting OpenAIS daemon (aisexec): starting... rc=0: OK
[r...@phys-file02 log]# crm status


============
Last updated: Wed Aug 12 08:01:33 2009
Stack: openais
Current DC: phys-file01.physics.gatech.edu - partition with quorum
Version: 1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ phys-file01.physics.gatech.edu phys-file02.physics.gatech.edu ]

Master/Slave Set: ms-drbd_export
        Masters: [ phys-file01.physics.gatech.edu ]
        Slaves: [ phys-file02.physics.gatech.edu ]
Master/Slave Set: ms-drbd_scratch
        Masters: [ phys-file01.physics.gatech.edu ]
        Slaves: [ phys-file02.physics.gatech.edu ]
Resource Group: fileserver
fs_export (ocf::heartbeat:Filesystem): Started phys-file01.physics.gatech.edu fs_scratch (ocf::heartbeat:Filesystem): Started phys-file01.physics.gatech.edu virtual-ip-1 (ocf::heartbeat:IPaddr2): Started phys-file01.physics.gatech.edu
    nfs (lsb:nfs):      Started phys-file01.physics.gatech.edu
    samba       (lsb:smb):      Started phys-file01.physics.gatech.edu
Clone Set: pingd-clone
Started: [ phys-file01.physics.gatech.edu phys-file02.physics.gatech.edu ]

I am not quite sure how to fix this to guarantee that openais always starts crm correctly. My drbd interfaces are bonded, but they are set to mode 2 which is failover, no round robing nor teaming, etc.

[r...@phys-file02 log]# cat /proc/net/bonding/bond1 | grep Mode
Bonding Mode: fault-tolerance (active-backup)
[r...@phys-file02 log]# cat /proc/net/bonding/bond2 | grep Mode
Bonding Mode: fault-tolerance (active-backup)
[r...@phys-file02 log]# ifconfig bond1 | grep "inet addr"
          inet addr:10.0.0.22  Bcast:10.0.0.255  Mask:255.255.255.0
[r...@phys-file02 log]# ifconfig bond2 | grep "inet addr"
          inet addr:10.0.1.22  Bcast:10.0.1.255  Mask:255.255.255.0

[r...@phys-file02 log]# grep addr /etc/ais/openais.conf
                bindnetaddr: 10.0.0.0
                mcastaddr: 226.94.0.1
                bindnetaddr: 10.0.1.0
                mcastaddr: 226.94.1.1

On the other node:

[r...@phys-file01 ~]# /etc/init.d/openais restart
Stopping OpenAIS daemon (aisexec): ..........OK
Starting OpenAIS daemon (aisexec): starting... rc=0: OK
[r...@phys-file01 ~]# crm status

Connection to cluster failed: connection failed

Diego

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to