In an OCFS2 cluster of XenServer 7.1.1 hosts, we met the same issue. -- You received this bug notification because you are a member of Ubuntu High Availability Team, which is subscribed to ocfs2-tools in Ubuntu. https://bugs.launchpad.net/bugs/613793
Title: o2cb stopping Failed Status in ocfs2-tools package in Ubuntu: Confirmed Bug description: Binary package hint: ocfs2-tools Ubuntu release: Description: Ubuntu 10.04.1 LTS Release: 10.04 Package version: ocfs2-tools 1.4.3-1 The script /etc/init.d/o2cb exits with an error when stopped and the services do not stop. Here the error message: /etc/init.d/o2cb stop Stopping O2CB cluster ocfs2: Failed Unable to stop cluster as heartbeat region still active I have identified a first error in the script. In the function clean_heartbeat the following if: if [ ! -f "$(configfs_path)/cluster/${CLUSTER}/heartbeat/*" ] then return fi is always true and the function returns. If the intention was to check the existence of the directory code must be: if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" ] then echo "OK" return fi An error persist even after these changes. /etc/init.d/o2cb stop Cleaning heartbeat on ocfs2: Failed At least one heartbeat region still active I added some lines for debugging by changing the function so: # # clean_heartbeat() # Removes the inactive heartbeat regions # clean_heartbeat() { if [ "$#" -lt "1" -o -z "$1" ] then echo "clean_heartbeat(): Requires an argument" >&2 return 1 fi CLUSTER="$1" if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" ] then echo "OK" return fi echo -n "Cleaning heartbeat on ${CLUSTER}: " ls -1 "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" | while read HBUUID do if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/${HBUUID}" ] then continue fi echo echo "DEBUG ocfs2_hb_ctl -I -u ${HBUUID} 2>&1" OUTPUT="`ocfs2_hb_ctl -I -u ${HBUUID} 2>&1`" if [ $? != 0 ] then echo "Failed" echo "${OUTPUT}" >&2 exit 1 fi echo "DEBUG ${OUTPUT}" REF="`echo ${OUTPUT} | awk '/refs/ {print $2; exit;}' 2>&1`" echo "DEBUG REF=$REF" if [ $REF != 0 ] then echo "Failed" echo "At least one heartbeat region still active" >&2 exit 1 else OUTPUT="`ocfs2_hb_ctl -K -u ${HBUUID} 2>&1`" fi done if [ $? = 1 ] then exit 1 fi echo "OK" } The new output is: /etc/init.d/o2cb stop Cleaning heartbeat on ocfs2: DEBUG ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 2>&1 DEBUG FC046AD7B2584E7EB12A7293993C81B0: 2 refs DEBUG REF=2 Failed At least one heartbeat region still active At this point I checked the source code ocfs2_hb_ctl. The command ocfs2_hb_ctl-I-u ${HBUUID} returns the number of references in a semaphore used by programs that manage ocfs filesystem. In the source file libo2cb/o2cb_api.c: - the function o2cb_mutex_down increases the second semaphore; - the function o2cb_mutex_up decreases the first semaphore; - the function __o2cb_get_ref increases the first semaphore; - the function __o2cb_drop_ref decreases the first semaphore. I have not found the point where the second semaphore is decreased. This could be the cause of the error. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/613793/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~ubuntu-ha Post to : ubuntu-ha@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp