Hi, On Wed, Dec 23, 2009 at 02:20:32PM +0100, Sebastian Reitenbach wrote: > Hi, > > as said, I updated an other cluster, this time a three node cluster, and I > took some time with one XEN resource configured to test a bit with it. > The XEN resource also had the pingd constraint defined. > > There I observed the following things, see below: > > On Monday 21 December 2009 12:44:17 pm Dejan Muhamedagic wrote: > > Hi, > > > > > > > > > > I wonder about some things: > > > 1. why three of the pingd resources had no description shown after > > > leaving the maintenance mode. > I have seen sth. similar too, but did not observed anything strange that > would > explain that to me. > > > > > > > 2. why all XEN resources were shut down after leaving the maintenance > > > mode. Here I have a theory: In maintenance mode, the pingd attribute did > > > not got updated, and because heartbeat was restarted on each node, the > > > attribute was not set. Therefore when leaving the maintenance mode, > > > pacemaker decided to shut down the XEN resources, because the pingd > > > attribute was not set. > > > > Sounds like a plausible explanation. > That seems to be the case, I tested: > 1. put the whole cluster into maintenance mode. Then all resources went into > maintenance mode. > 2. I did not restarted heartbeat on each of the nodes > 3. I disabled the maintenance mode again, everything stayed to be fine, the > XEN resource was still running. > The pingd=1 attribute for the node where the XEN resource was running on, was > still there. > > Then I tested again: > 1. put the whole cluster into maintenance mode. Then all resources went into > maintenance mode. > 2. I now restarted heartbeat on each of the nodes > 3. I disabled the maintenance mode again, and the XEN resource was shut down > immediately > 4. I waited, and waited and waited, but the pingd resources did not updated > the pingd=1 attribute > 5. Stop the pingd clone, I saw the pingd attribute was set to 0 > 6. Start the pingd clone, I saw the pingd attribute was set to 1 > 7. After a while, the XEN resource was starting > > Therefore, my workaround to the problem, in case I need to restart heartbeat, > now is: > 1. Put each resource depending on pingd attribute into maintenance mode > separately > 2. Stop Pingd > 3. restart heartbeat on each of the cluster nodes > 4. wait until the hearbeat is back again, and then start the pingd resource > again > 5. watch the logfiles, until the pingd attribute for the nodes get set to 1 > 6. put each resource separately into maintained mode again > 7. everything is fine then! > So I wonder whether this is by design that the pingd doesn't update the > attribute when it is transitioning from maintenance to maintained mode? > Or could this considered a bug or sth. for an enhancement request?
I'd say it's a bug. Not sure where though, in pingd or the RA. Is there anything in the logs? > > > 3. Why the pingd attribute was not set immediately after pingd started > > > up, and was able to ping the ping node. After the pingd was started, then > > > it waited 60 seconds (the timeout value) to set the attribute so that > > > then the XEN resources were able to start, due to their location > > > constraint. > I must have observed this somehow falsely. The pingd attribute was set only > some seconds after the pingd was started on the nodes. However, the depending > XEN resources, were only started about a minute after that happend. > Is there any parameter I can use to shorten that time frame from a minute to > some seconds? Don't think so. I don't understand why it waited. > > > 4. Maybe the answers to the other questions will answer this alaready: > > > Why the cluster behaved that strange at all with the large timeout values > > > set in ha.cf. > I also tested here with larger values for deadtime and initdead in > /etc/ha.d/ha.cf file, and did not observed any strange behaviour. So I guess > that observation was just a coincidence from the above.... > > > > > > > I could also send a cluster-report in case it may help to figure out what > > > was wrong here, I just did not wanted to send a large attachement to the > > > list in the first place. > > > > Probably the best to open a bugzilla and attach there the report. > > I guess that special care is necessary on setting resources to > > the unmanaged mode in case there are constraints which depend on > > pingd attributes. > Due to my further observations, no real need to open a bug report anymore. It's not clear what happened to the pingd attribute: was it updated immediately or not? A Xen resource also started later than expected, that should be investigated too. Cheers, Dejan > thanks and a happy Christmas, > Sebastian > > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker