Hi, On Thu, Oct 03, 2013 at 11:41:00PM +0000, Andrew Daugherity wrote: > On Oct 1, 2013, at 2:41 PM, pacemaker-requ...@oss.clusterlabs.org wrote: > > Message: 4 > > Date: Tue, 1 Oct 2013 19:22:12 +0200 > > From: Dejan Muhamedagic <deja...@fastmail.fm> > > To: pacemaker@oss.clusterlabs.org > > Subject: Re: [Pacemaker] Bug? Resources running with realtime priority > > - possibly causing monitor timeouts > > Message-ID: <20131001172212.GC6892@walrus.homenet> > > Content-Type: text/plain; charset=us-ascii > > > > Hi, > > > > On Tue, Oct 01, 2013 at 11:07:35AM +0200, Joschi Brauchle wrote: > >> Hello everyone, > >> > >> on two (recently upgraded) SLES11SP3 machines, we are running an > >> active/passive NFS fileserver and several other high availability > >> services using corosync + pacemaker (see version numbers below). > >> > >> We are having severe problems with resource monitors timing out > >> during our system backup at night, where the active machine is under > >> high IO load. These problems did not exist under SLES11SP1, from > >> which we just upgraded some days ago. > >> > >> After some diagnosis, it turns out that actually all cluster > >> resources which are started by pacemaker are running with realtime > >> priority, which includes our backup service. This seems not to be > >> correct! > >> > > Oops. Looks like neither corosync nor lrmd reset the priority and > > scheduler for their children. > > > >> As far as we remember from SLES11SP1, the resources were not running > >> in realtime priority there. Hence, this looks like a bug in the more > >> recent pacemaker/corosync version?!? > > > > Looks like it. Can you please open a support call. > Dejan, > > Any idea if SP2 is also affected?
No. Thanks, Dejan > Fortunately, it shouldn't affect me, since I'm just managing VMs (and > mounting filesystems) with pacemaker, and not spawning a bunch of > long-running processes. > > > Joschi, > > As a workaround (and potential best practice anyway), try setting > elevator=deadline in the kernel boot parameters. This will give better > response under heavy I/O load. I'm not sure how effective it will be with > everything running realtime priority, but assuming you're I/O-bound rather > than CPU-bound, it should help, and is something I now set on all cluster > members. > > Before setting this, during periods of high I/O on the SAN (such as migrating > several VMs at once during 'rcopenais stop' on one node), occasionally > monitor operations would time out and pacemaker would stop and start > unrelated VMs needlessly, thinking they had failed. Afterwards, no more > problems. > > > Andrew Daugherity > Systems Analyst > Division of Research, Texas A&M University > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org