Re: [Pacemaker] Pacemaker Corosync Issue

Andrew Beekhof Thu, 16 Oct 2014 02:33:37 -0700

On 16 Oct 2014, at 7:56 pm, Sahil Aggarwal <sahilaggarw...@gmail.com> wrote:


> Sorry, i didn't get your point and i am again re-iterating the problem: 
> 
> Two Node cluster Node A , Node B .
> 
> Service X running on Node A, Node B is DC.
> 
> We are using stack corosync with Pacemaker.
> Failure Timeout is 10 sec . 
> Target-Role is started . 
> 
> Events happens like this
>       • Node A sends event to Node B Service X is down
>       • Node B prints Ignoring expired failure for Service X
>       • After this Service X is never restarted by the Cluster.
> 
> 
> Now questions are:
> 
>       • Why is Node B (DC) ignoring the expired failure?

Because you told it to

>       • Even for this time DC ignored but as the Service X is down, Node A 
> should monitor the service and again send failure status to Node B and at 
> that time Node B should restart the service. Why this no hapenning?
> 
> 
> For FAILURE TIMEOUT: my understanding is:
> 
>       • Node A sends Failure event of Service X to Node B(DC) at time T and 
> failcount of Service X on Node A reached infinity and Node A is the only node 
> where Service X can run
>       • Now Node B (DC) will after T+FailureTimeoutSecounds will set the 
> failcount of Service X on Node A to Zero and again restart the Service X on 
> Node A.
> 
> 
> As per you Node B will ignore the Service X failure on Node A after Failure 
> Timeout seconds. From which point Node B  starts calculating those seconds??
> 
> 
> 
> On Thu, Oct 16, 2014 at 1:07 PM, Andrew Beekhof <and...@beekhof.net> wrote:
> 
> On 16 Oct 2014, at 6:33 pm, Sahil Aggarwal <sahilaggarw...@gmail.com> wrote:
> 
> > Hello ,
> >
> > Yes that log might be due to that reason but , it should not ignore the 
> > resource as it is not taking any action for that resource i..e. not 
> > starting the resource .
> 
> it doesn't know that at the time
> 
> >
> > and second thing
> >
> > generally ignoring expired failure log comes as
> >  notice: unpack_rsc_op: Ignoring expired failure Server_last_failure_0
> >
> > but in case where service is ignored , log comes as
> >  notice: unpack_rsc_op: Ignoring expired failure (calculated) 
> > Server_last_failure_0
> >
> > this might be some another case.
> 
> possibly in the old code, but the latest has them combined
> 
> >
> > Please Suggest .
> >
> >
> >
> > On Thu, Oct 16, 2014 at 2:38 AM, Andrew Beekhof <and...@beekhof.net> wrote:
> > You don't think that might be a little short?
> > Any failure that happened more than 10s is going to be ignored, leading to 
> > the pengine message you saw.
> >
> > On 16 Oct 2014, at 12:21 am, Sahil Aggarwal <sahilaggarw...@gmail.com> 
> > wrote:
> >
> > > failure timeout for resource is 10s.
> > >
> > > On Wed, Oct 15, 2014 at 2:51 AM, Andrew Beekhof <and...@beekhof.net> 
> > > wrote:
> > >
> > > On 15 Oct 2014, at 4:23 am, Sahil Aggarwal <sahilaggarw...@gmail.com> 
> > > wrote:
> > >
> > > >
> > > > Hello Team Pacemaker,
> > > >
> > > > I am facing a constant issue with Pacemaker, it does not restart the 
> > > > Service even when he knows that the Service is down. It generates a 
> > > > message saying "Ignoring Expired Failure" for the service.
> > >
> > > What is the failure timeout set to?
> > >
> > > > Pacemaker and Corosync version are given below. OS CentOS 6.2
> > > >
> > > > corosync-1.4.1-4.el6_2.2.x86_64 pacemaker-1.1.9-2.el6.x86_64
> > > >
> > > > Log which pengine provide is:
> > > >
> > > >  pengine[45232]:   notice: unpack_rsc_op: Ignoring expired failure 
> > > > (calculated) Server_last_failure_0 (rc=7, 
> > > > magic=0:7;14:5699:0:459093cc-f3a1-483b-b853-53a1d9791361)
> > > >
> > > > Some more info is:
> > > >
> > > > 1.This is a two node cluster. There is time difference of 10 min b/w 
> > > > the two nodes.
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Sahil
> > > > Mobile - 09467607999
> > > > fbAddress-www.facebook.com/SahilAggarwalg
> > >
> > >
> > >
> > >
> > > --
> > > Sahil
> > > Mobile - 09467607999
> > > fbAddress-www.facebook.com/SahilAggarwalg
> >
> >
> >
> >
> > --
> > Sahil
> > Mobile - 09467607999
> > fbAddress-www.facebook.com/SahilAggarwalg
> 
> 
> 
> 
> -- 
> Sahil
> Mobile - 09467607999
> fbAddress-www.facebook.com/SahilAggarwalg

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker Corosync Issue

Reply via email to