In an OCFS2 cluster of XenServer 7.1.1 hosts, we met the same issue.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to ocfs2-tools in Ubuntu.
https://bugs.launchpad.net/bugs/613793

Title:
  o2cb stopping Failed

Status in ocfs2-tools package in Ubuntu:
  Confirmed

Bug description:
  Binary package hint: ocfs2-tools

  Ubuntu release:
  Description:    Ubuntu 10.04.1 LTS
  Release:        10.04
  Package version:
  ocfs2-tools                      1.4.3-1

  The script /etc/init.d/o2cb exits with an error when stopped and the services 
do not stop.
  Here the error message:

  /etc/init.d/o2cb stop
  Stopping O2CB cluster ocfs2: Failed
  Unable to stop cluster as heartbeat region still active

  I have identified a first error in the script. In the function
  clean_heartbeat the following if:

  if [ ! -f "$(configfs_path)/cluster/${CLUSTER}/heartbeat/*" ]
      then
          return
  fi

  is always true and the function returns. If the intention was to check
  the existence of the directory code must be:

  if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" ]
      then
          echo "OK"
          return
  fi

  An error persist even after these changes.

  /etc/init.d/o2cb stop
  Cleaning heartbeat on ocfs2: Failed
  At least one heartbeat region still active

  I added some lines for debugging by changing the function so:

  #
  # clean_heartbeat()
  # Removes the inactive heartbeat regions
  #
  clean_heartbeat()
  {
      if [ "$#" -lt "1" -o -z "$1" ]
      then
          echo "clean_heartbeat(): Requires an argument" >&2
          return 1
      fi
      CLUSTER="$1"

      if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" ]
      then
          echo "OK"
          return
      fi

      echo -n "Cleaning heartbeat on ${CLUSTER}: "

      ls -1 "$(configfs_path)/cluster/${CLUSTER}/heartbeat/" | while read HBUUID
      do
          if [ ! -d "$(configfs_path)/cluster/${CLUSTER}/heartbeat/${HBUUID}" ]
          then
              continue
          fi

  echo
  echo "DEBUG ocfs2_hb_ctl -I -u ${HBUUID} 2>&1"
          OUTPUT="`ocfs2_hb_ctl -I -u ${HBUUID} 2>&1`"
          if [ $? != 0 ]
          then
              echo "Failed"
              echo "${OUTPUT}" >&2
              exit 1
          fi

  echo "DEBUG ${OUTPUT}"
          REF="`echo ${OUTPUT} | awk '/refs/ {print $2; exit;}' 2>&1`"
  echo "DEBUG REF=$REF"
          if [ $REF != 0 ]
          then
             echo "Failed"
             echo "At least one heartbeat region still active" >&2
             exit 1
          else
             OUTPUT="`ocfs2_hb_ctl -K -u ${HBUUID} 2>&1`"
          fi
      done
      if [ $? = 1 ]
      then
          exit 1
      fi
      echo "OK"
  }

  The new output is:

  /etc/init.d/o2cb stop
  Cleaning heartbeat on ocfs2:
  DEBUG ocfs2_hb_ctl -I -u FC046AD7B2584E7EB12A7293993C81B0 2>&1
  DEBUG FC046AD7B2584E7EB12A7293993C81B0: 2 refs
  DEBUG REF=2
  Failed
  At least one heartbeat region still active

  At this point I checked the source code ocfs2_hb_ctl. The command 
ocfs2_hb_ctl-I-u ${HBUUID} returns the number of references in a semaphore used 
by programs that manage ocfs filesystem. In the source file libo2cb/o2cb_api.c:
  - the function o2cb_mutex_down increases the second semaphore;
  - the function o2cb_mutex_up decreases the first semaphore;
  - the function __o2cb_get_ref increases the first semaphore;
  - the function __o2cb_drop_ref decreases the first semaphore.

  I have not found the point where the second semaphore is decreased.
  This could be the cause of the error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ocfs2-tools/+bug/613793/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

Reply via email to