Hello!

I wanted to ask for your help because we are having much trouble with cluster based on Pacemaker.

We have two identical nodes - PowerEdge R510 with 2x Xeon X5650, 64 GB of RAM, MegaRAID SAS 2108 RAID (PERC H700) - system disk - RAID 1 on SSDs (SSDSC2CW060A3) and two volumes - one RAID 1 with WD3000FYYZ and one RAID 1 with WD1002FBYS -- both Western Digital disks. Both nodes are linked with two gigabit direct fiber links (no switch in between).

We have two DRBD volumes - /dev/drbd1 (1TB on WD1002FBYS disks) and /dev/drbd2 (3TB on WD3000FYYZ disks). On top of DRBD (used as PVs) we have a LVM with LVs for virtual machines which run under XEN.

Here is our CRM configuration - http://pastebin.com/raqsvRTA

We have previously used fast USB drives instead of SSD for root filesystem and it caused some trouble - it was lagging on I/O and one node "thought" that another one was having trouble and performing STONITH on it. After replacing it with SSDs we had no more trouble with that issue.

But now from time to time it happens that we get STONITH of one nodes, and reason is unclear to us.

For example last time we found it in logs:

Nov 23 15:14:24 rivendell-B crmd: [9529]: info: process_lrm_event: LRM operation primitive-LVM:1_monitor_120000 (call=54, rc=7, cib-update=124, confirmed=false) not running

And after that node rivendell-B got STONITH. Previously we had trouble with DRBD - node stopped DRBD for no apparent reason and again - STONITH. Unfortunately we did not check logs that time.

Also when doing some tasks on one of nodes (for example "crm resource migrate" of few XEN virtual machines) it can cause STONITH also.

Could you give us some hints? Maybe our configuration is wrong? To be honest we had no previous experience with HA clusters so we created it based on configuration.

It is working now for over a year now but giving us headaches and we are wondering if we should drop Pacemaker and use something else (even manual stopping and starting of virtual machines comes in mind).

Thank you in advance!

--
Michał Margula, alche...@uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to