On Tue, Jun 22, 2010 at 4:05 PM, Robert Lindgren <robert.lindg...@gmail.com>wrote:
> Hi All, > > Sorry if this topic has been up before, I'm new to this list. > > I have a problem with gfs2_quotad turning up in D > state, uninterpretable sleep, when I set one of my nodes in standby. Hence > VirtualDomain resource agent stops working, since libvirt fails to read from > gfs (drbd primary/primary). I'm running Ubuntu Lucid with the gfs tools from > Cluster-stack ppa. > > If one one node is started gfs doesn't behave like this, and when one host > is lost due to standby or powercord yank. > > Any hints would be appreciated. > Last things in dmesg are: [86392.002282] block drbd0: conn( Unconnected -> WFConnection ) [86395.120629] dlm: closing connection to node 34212362 [86395.162508] GFS2: fsid=pcmk:pcmk.0: jid=1: Trying to acquire journal lock... [86520.420036] INFO: task kslowd000:2290 blocked for more than 120 seconds. After that I see that gfs doesn't respond: r...@sugadaddy:/var/log# ps -efl | grep D F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 1 D root 2290 2 0 75 -5 - 0 dlm_lo Jun22 ? 00:00:00 [kslowd000] 1 D root 2310 2 0 80 0 - 0 dlm_lo Jun22 ? 00:00:00 [gfs2_quotad] 0 D root 18713 1147 0 80 0 - 4658 dlm_lo 15:54 ? 00:00:00 /usr/lib/libvirt/virt-aa-helper -r -u libvirt-c0b6fc07-7195-4e31-7ce7-e12d5b71bdee Looks to be the gfs2_quotad stalling but since parentpid 2 is kinda hard to see if it's really quotad stalling or if it's waiting for the journal lock. Not sure why I don't get the journal lock...
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker