On 30 Jul 2014, at 2:37 am, divinesecret <arvy...@artogama.lt> wrote:

> No dhcp.
> no nm.
> 
> Somehow findif fails to find eth1 at random times (exactly eth1, while there 
> are resources with eth2,eth3 with no such problem)
> 
> any ideas?

IPaddr2(extVip51)[23854]: INFO: Bringing device eth1 up

^^^ does that imply that the agent may also take it down under some conditions?
perhaps look through the agent to see when that might happen and if it could be 
happening in your cluster.

> 
> 2014-07-10 01:26, Andrew Beekhof rašė:
>> Is NetworkManager present?  Using dhcp for that interface?
>> On 9 Jul 2014, at 7:03 pm, divinesecret <arvy...@artogama.lt> wrote:
>>> Hi,
>>> just wanted to ask maybe someone encountered such situation.
>>> suddenly cluster fails:
>>> Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: Unknown 
>>> interface [eth1] No such device.
>>> Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: [findif] 
>>> failed
>>> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: process_lrm_event: LRM 
>>> operation extVip51_monitor_20000 (call=57, rc=6, cib-update=2151, 
>>> confirmed=false) not configured
>>> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:  warning: update_failcount: 
>>> Updating failcount for extVip51 on sdcsispprxfe1 after failed monitor: rc=6 
>>> (update=value++, time=1404868678)
>>> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: do_state_transition: 
>>> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
>>> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>>> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_trigger_update: 
>>> Sending flush op to all hosts for: fail-count-extVip51 (1)
>>> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:   notice: unpack_config: On 
>>> loss of CCM Quorum: Ignore
>>> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_perform_update: 
>>> Sent update 42: fail-count-extVip51=1
>>> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_trigger_update: 
>>> Sending flush op to all hosts for: last-failure-extVip51 (1404868678)
>>> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:    error: unpack_rsc_op: 
>>> Preventing extVip51 from re-starting anywhere in the cluster : operation 
>>> monitor failed 'not configured' (rc=6)
>>> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:  warning: unpack_rsc_op: 
>>> Processing failed op monitor for extVip51 on sdcsispprxfe1: not configured 
>>> (6)
>>> restart was issued and then:
>>> IPaddr2(extVip51)[23854]: INFO: Bringing device eth1 up
>>> ....
>>> Version: 1.1.10-14.el6_5.3-368c726
>>> centos 6.5
>>> (other logs don't show eth1 going down or sthing similar)
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to