Hi,

On Thu, Oct 03, 2013 at 11:41:00PM +0000, Andrew Daugherity wrote:
> On Oct 1, 2013, at 2:41 PM, pacemaker-requ...@oss.clusterlabs.org wrote:
> > Message: 4
> > Date: Tue, 1 Oct 2013 19:22:12 +0200
> > From: Dejan Muhamedagic <deja...@fastmail.fm>
> > To: pacemaker@oss.clusterlabs.org
> > Subject: Re: [Pacemaker] Bug? Resources running with realtime priority
> >     - possibly causing monitor timeouts
> > Message-ID: <20131001172212.GC6892@walrus.homenet>
> > Content-Type: text/plain; charset=us-ascii
> > 
> > Hi,
> > 
> > On Tue, Oct 01, 2013 at 11:07:35AM +0200, Joschi Brauchle wrote:
> >> Hello everyone,
> >> 
> >> on two (recently upgraded) SLES11SP3 machines, we are running an
> >> active/passive NFS fileserver and several other high availability
> >> services using corosync + pacemaker (see version numbers below).
> >> 
> >> We are having severe problems with resource monitors timing out
> >> during our system backup at night, where the active machine is under
> >> high IO load. These problems did not exist under SLES11SP1, from
> >> which we just upgraded some days ago.
> >> 
> >> After some diagnosis, it turns out that actually all cluster
> >> resources which are started by pacemaker are running with realtime
> >> priority, which includes our backup service. This seems not to be
> >> correct!
> >> 
> > Oops. Looks like neither corosync nor lrmd reset the priority and
> > scheduler for their children.
> > 
> >> As far as we remember from SLES11SP1, the resources were not running
> >> in realtime priority there. Hence, this looks like a bug in the more
> >> recent pacemaker/corosync version?!?
> > 
> > Looks like it. Can you please open a support call.
> Dejan,
> 
> Any idea if SP2 is also affected?

No.

Thanks,

Dejan

> Fortunately, it shouldn't affect me, since I'm just managing VMs (and 
> mounting filesystems) with pacemaker, and not spawning a bunch of 
> long-running processes.
> 
> 
> Joschi,
> 
> As a workaround (and potential best practice anyway), try setting 
> elevator=deadline in the kernel boot parameters.  This will give better 
> response under heavy I/O load.  I'm not sure how effective it will be with 
> everything running realtime priority, but assuming you're I/O-bound rather 
> than CPU-bound, it should help, and is something I now set on all cluster 
> members.
> 
> Before setting this, during periods of high I/O on the SAN (such as migrating 
> several VMs at once during 'rcopenais stop' on one node), occasionally 
> monitor operations would time out and pacemaker would stop and start 
> unrelated VMs needlessly, thinking they had failed.  Afterwards, no more 
> problems.
> 
> 
> Andrew Daugherity
> Systems Analyst
> Division of Research, Texas A&M University
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to