On 16 Oct 2014, at 7:56 pm, Sahil Aggarwal <sahilaggarw...@gmail.com> wrote:
> Sorry, i didn't get your point and i am again re-iterating the problem: > > Two Node cluster Node A , Node B . > > Service X running on Node A, Node B is DC. > > We are using stack corosync with Pacemaker. > Failure Timeout is 10 sec . > Target-Role is started . > > Events happens like this > • Node A sends event to Node B Service X is down > • Node B prints Ignoring expired failure for Service X > • After this Service X is never restarted by the Cluster. > > > Now questions are: > > • Why is Node B (DC) ignoring the expired failure? Because you told it to > • Even for this time DC ignored but as the Service X is down, Node A > should monitor the service and again send failure status to Node B and at > that time Node B should restart the service. Why this no hapenning? > > > For FAILURE TIMEOUT: my understanding is: > > • Node A sends Failure event of Service X to Node B(DC) at time T and > failcount of Service X on Node A reached infinity and Node A is the only node > where Service X can run > • Now Node B (DC) will after T+FailureTimeoutSecounds will set the > failcount of Service X on Node A to Zero and again restart the Service X on > Node A. > > > As per you Node B will ignore the Service X failure on Node A after Failure > Timeout seconds. From which point Node B starts calculating those seconds?? > > > > On Thu, Oct 16, 2014 at 1:07 PM, Andrew Beekhof <and...@beekhof.net> wrote: > > On 16 Oct 2014, at 6:33 pm, Sahil Aggarwal <sahilaggarw...@gmail.com> wrote: > > > Hello , > > > > Yes that log might be due to that reason but , it should not ignore the > > resource as it is not taking any action for that resource i..e. not > > starting the resource . > > it doesn't know that at the time > > > > > and second thing > > > > generally ignoring expired failure log comes as > > notice: unpack_rsc_op: Ignoring expired failure Server_last_failure_0 > > > > but in case where service is ignored , log comes as > > notice: unpack_rsc_op: Ignoring expired failure (calculated) > > Server_last_failure_0 > > > > this might be some another case. > > possibly in the old code, but the latest has them combined > > > > > Please Suggest . > > > > > > > > On Thu, Oct 16, 2014 at 2:38 AM, Andrew Beekhof <and...@beekhof.net> wrote: > > You don't think that might be a little short? > > Any failure that happened more than 10s is going to be ignored, leading to > > the pengine message you saw. > > > > On 16 Oct 2014, at 12:21 am, Sahil Aggarwal <sahilaggarw...@gmail.com> > > wrote: > > > > > failure timeout for resource is 10s. > > > > > > On Wed, Oct 15, 2014 at 2:51 AM, Andrew Beekhof <and...@beekhof.net> > > > wrote: > > > > > > On 15 Oct 2014, at 4:23 am, Sahil Aggarwal <sahilaggarw...@gmail.com> > > > wrote: > > > > > > > > > > > Hello Team Pacemaker, > > > > > > > > I am facing a constant issue with Pacemaker, it does not restart the > > > > Service even when he knows that the Service is down. It generates a > > > > message saying "Ignoring Expired Failure" for the service. > > > > > > What is the failure timeout set to? > > > > > > > Pacemaker and Corosync version are given below. OS CentOS 6.2 > > > > > > > > corosync-1.4.1-4.el6_2.2.x86_64 pacemaker-1.1.9-2.el6.x86_64 > > > > > > > > Log which pengine provide is: > > > > > > > > pengine[45232]: notice: unpack_rsc_op: Ignoring expired failure > > > > (calculated) Server_last_failure_0 (rc=7, > > > > magic=0:7;14:5699:0:459093cc-f3a1-483b-b853-53a1d9791361) > > > > > > > > Some more info is: > > > > > > > > 1.This is a two node cluster. There is time difference of 10 min b/w > > > > the two nodes. > > > > > > > > > > > > -- > > > > Regards, > > > > Sahil > > > > Mobile - 09467607999 > > > > fbAddress-www.facebook.com/SahilAggarwalg > > > > > > > > > > > > > > > -- > > > Sahil > > > Mobile - 09467607999 > > > fbAddress-www.facebook.com/SahilAggarwalg > > > > > > > > > > -- > > Sahil > > Mobile - 09467607999 > > fbAddress-www.facebook.com/SahilAggarwalg > > > > > -- > Sahil > Mobile - 09467607999 > fbAddress-www.facebook.com/SahilAggarwalg
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org