Hi all, (If this is not the correct mailinglist for asking this question I apologize...maybe you could then give me a hint where this might fit better.)
I've setup a test cluster with Debian Wheezy, pacemaker/corosync, drbd dual primary and Xen on top of this. I know I should configure STONITH; up until now I just didn't do this because it's a test environment. I've got serveral drbd ressources and Xen dom0s. For most of them live migration is working like a charm...but I've got problems with one drbd ressource and the state changing after a Xen dom0 migration (I guess that this is the problem at least). I checked my configuration for differences between the ressources, but I didn't found them, and as long as I remember correctly, they're all set up identically. The drbd.conf snippet of the ressource looks like: resource nfs { flexible-meta-disk internal; device /dev/drbd4; protocol C; on ha1 { device /dev/drbd4; disk /dev/XenHosting/nfs-disk; address 10.10.10.1:7804; } on ha2 { device /dev/drbd4; disk /dev/XenHosting/nfs-disk; address 10.10.10.2:7804; } net { data-integrity-alg sha1; allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } startup { become-primary-on both; } } Relevant snippets of cib: primitive p_drbd_nfs ocf:linbit:drbd \ params drbd_resource="nfs" \ op monitor interval="20" role="Master" timeout="60" \ op monitor interval="30" role="Slave" timeout="60" ms ms_drbd_nfs p_drbd_nfs \ meta master-max="2" notify="true" target-role="Started" primitive nfs ocf:heartbeat:Xen \ params xmfile="/cluster/xen/nfs" \ meta allow-migrate="true" target-role="Started" \ op monitor interval="10" \ op start interval="0" timeout="45" \ op stop interval="0" timeout="300" \ op migrate_from interval="0" timeout="240" \ op migrate_to interval="0" timeout="240" order o_nfs inf: ms_drbd_nfs:promote nfs:start If I start the ressources "clean", which means if they're not yet running, all is fine. If I after this stop the dom0 or do a live migration, I've got "failed actions", like: Failed actions: p_drbd_nfs:1_monitor_20000 (node=ha2, call=837, rc=0, status=complete): ok (This will change from failed to ok during two seconds, live migration is successfull.) I played around with timeouts and stuff, but no luck. In the logs there is written "transition aborted", could this be the problem? Next to: "Sending state for detaching disk failed". I've put my logs for better reading to pastebin: [0] tail -f /var/log/syslog [1] tail -f /var/log/dmesg I'm quite clueless what to do. Help would be really appreciated... Thanks in advance, Georg [0] http://pastebin.com/09FT14Us [1] http://pastebin.com/5nnwZjiz _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org