Can you define "not correctly" please?
I'd rather not ignore such behavior.
The machine would come up and not join the cluster. Checking the status
of openais would show as "Running". crm status would show:
Connection to cluster failed: connection failed
A look at the log file shows:
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] AIS Executive Service
RELEASE 'subrev 1152 version 0.80'
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] Copyright (C)
2002-2006 MontaVista Software, Inc and contributors.
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] Copyright (C) 2006
Red Hat, Inc.
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] AIS Executive
Service: started and ready to provide service.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Token Timeout (3000
ms) retransmit timeout (294 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] token hold (225 ms)
retransmits before loss (10 retrans)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] join (60 ms)
send_join (0 ms) consensus (1500 ms) merge (200 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] downcheck (1000 ms)
fail to recv const (50 msgs)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] seqno unchanged const
(30 rotations) Maximum network MTU 1500
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] window size per
rotation (50 messages) maximum messages per rotation (20 messages)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] send threads (0 threads)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP token expired
timeout (294 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP token problem
counter (2000 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP threshold (10
problem count)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] RRP mode set to passive.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM]
heartbeat_failures_allowed (0)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] max_network_delay (50 ms)
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] HeartBeat is
Disabled. To enable set heartbeat_failures_allowed > 0
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Receive multicast
socket recv buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Transmit multicast
socket send buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] The network interface
[10.0.0.22] is now up.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Created or loaded
sequence id 112.10.0.0.22 for this ring.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Receive multicast
socket recv buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] Transmit multicast
socket send buffer size (262142 bytes).
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] The network interface
[10.0.1.22] is now up.
Aug 12 07:57:17 phys-file02 openais[9380]: [TOTEM] entering GATHER state
from 15.
Aug 12 07:57:17 phys-file02 openais[9380]: [crm ] info:
process_ais_conf: Reading configure
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info:
config_find_next: Processing additional logging options...
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: get_config_opt:
Found 'on' for option: debug
Aug 12 07:57:17 phys-file02 openais[9380]: [MAIN ] info: get_config_opt:
Defaulting to 'off' for option: to_file
Aug 12 07:57:21 phys-file02 crm_shadow: [9396]: info: Invoked: crm_shadow
I try to stop ais but it fails, the dots just keep appearing on the stop
command progress:
[r...@phys-file02 log]# /etc/init.d/openais stop
Stopping OpenAIS daemon (aisexec):
..............................................
I have to Ctrl+C out of it and then
[r...@phys-file02 log]# pkill -9 aisexec
[r...@phys-file02 log]# ps -ef | grep ais
root 9639 5760 0 08:01 pts/1 00:00:00 grep ais
Then I start openais again and crm starts correctly.
[r...@phys-file02 log]# /etc/init.d/openais start
Starting OpenAIS daemon (aisexec): starting... rc=0: OK
[r...@phys-file02 log]# crm status
============
Last updated: Wed Aug 12 08:01:33 2009
Stack: openais
Current DC: phys-file01.physics.gatech.edu - partition with quorum
Version: 1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ phys-file01.physics.gatech.edu phys-file02.physics.gatech.edu ]
Master/Slave Set: ms-drbd_export
Masters: [ phys-file01.physics.gatech.edu ]
Slaves: [ phys-file02.physics.gatech.edu ]
Master/Slave Set: ms-drbd_scratch
Masters: [ phys-file01.physics.gatech.edu ]
Slaves: [ phys-file02.physics.gatech.edu ]
Resource Group: fileserver
fs_export (ocf::heartbeat:Filesystem): Started
phys-file01.physics.gatech.edu
fs_scratch (ocf::heartbeat:Filesystem): Started
phys-file01.physics.gatech.edu
virtual-ip-1 (ocf::heartbeat:IPaddr2): Started
phys-file01.physics.gatech.edu
nfs (lsb:nfs): Started phys-file01.physics.gatech.edu
samba (lsb:smb): Started phys-file01.physics.gatech.edu
Clone Set: pingd-clone
Started: [ phys-file01.physics.gatech.edu
phys-file02.physics.gatech.edu ]
I am not quite sure how to fix this to guarantee that openais always
starts crm correctly. My drbd interfaces are bonded, but they are set to
mode 2 which is failover, no round robing nor teaming, etc.
[r...@phys-file02 log]# cat /proc/net/bonding/bond1 | grep Mode
Bonding Mode: fault-tolerance (active-backup)
[r...@phys-file02 log]# cat /proc/net/bonding/bond2 | grep Mode
Bonding Mode: fault-tolerance (active-backup)
[r...@phys-file02 log]# ifconfig bond1 | grep "inet addr"
inet addr:10.0.0.22 Bcast:10.0.0.255 Mask:255.255.255.0
[r...@phys-file02 log]# ifconfig bond2 | grep "inet addr"
inet addr:10.0.1.22 Bcast:10.0.1.255 Mask:255.255.255.0
[r...@phys-file02 log]# grep addr /etc/ais/openais.conf
bindnetaddr: 10.0.0.0
mcastaddr: 226.94.0.1
bindnetaddr: 10.0.1.0
mcastaddr: 226.94.1.1
On the other node:
[r...@phys-file01 ~]# /etc/init.d/openais restart
Stopping OpenAIS daemon (aisexec): ..........OK
Starting OpenAIS daemon (aisexec): starting... rc=0: OK
[r...@phys-file01 ~]# crm status
Connection to cluster failed: connection failed
Diego
_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker