Hi Andrew, David, all, Just found interesting fact, don't know is it a bug or not.
When doing service pacemaker stop on a node which has drbd resource promoted, that resource does not promote on another node, and promote operation timeouts. This is related to drbd fence integration with pacemaker and to insufficient default (recommended) promote timeout for drbd resource. crm-fence-peer.sh places constraint to cib one second after promote operation timeouts (promote op has 90s timeout, and crm-fence-peer.sh uses that value as a timeout, and fully utilizes it if it cannot say for sure that peer node is in a "sane" state - online or cleanly offline). It seems like increasing promote op timeout helps, but, I'd expect that to complete almost immediately, instead of waiting extra 90 seconds for nothing. Looking at crm-fence-peer.sh script, it would determine peer state as offline immediately if node state (all of) * doesn't contain "expected" tag or has it set to "down" * has "in_ccm" tag set to false * has "crmd" tag set to anything except "online" On the other hand, crmd sets "expected" = "down" only after fencing is complete (probably the same for "in_ccm"?). Shouldn't is do the same (or may be just remove that tag) if clean shutdown about to be complete? Or may be it is possible to provide some different hint for crm_fence_peer.sh? Another option (actually hack) would be to delay shutdown between resources stop and processes stop (so drbd handler on the other node determines peer is still online, and places constraint immediately), but that is very fragile. pacemaker is one-week-old merge of clusterlab and bekkhof masters, drbd is 8.4.4. All runs on corosync2 (2.3.1) with libqb 0.16 on CentOS6. Vladislav _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org