Re: [Pacemaker] Corosync 1.4.7: zombie (defunct)

Sergey Arlashin Sun, 04 Jan 2015 22:19:44 -0800

Pacemaker 1.1.6

It runs on Ubuntu 12.04 LTS 64bit.


Linux lb-node1 3.11.0-23-generic #40~precise1-Ubuntu SMP Wed Jun 4 22:06:36 UTC 
2014 x86_64 x86_64 x86_64 GNU/Linux

--
Best regards,
Sergey Arlashin
 

On Jan 5, 2015, at 7:59 AM, Andrew Beekhof <and...@beekhof.net> wrote:

> pacemaker version?  it looks familiar but it depends on the version number.
> 
>> On 29 Dec 2014, at 10:24 pm, Sergey Arlashin <sergeyarl.maill...@gmail.com> 
>> wrote:
>> 
>> Hi!
>> Recently I've noticed that one of my nodes had OFFLINE status in 'crm 
>> status' output. But it actually was not. I could ssh on this node. I could 
>> get 'crm status' from that node's console. After some time it became online. 
>> It happened several times without any obvious reason with other nodes. 
>> 
>> Still no error of fatal messages in logs. The only warning messages I could 
>> get from corosync.log were the following:
>> 
>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1346 -> 0.233.1347 not applied to 0.233.1354: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1347 -> 0.233.1348 not applied to 0.233.1354: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1348 -> 0.233.1349 not applied to 0.233.1354: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1349 -> 0.233.1350 not applied to 0.233.1354: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1350 -> 0.233.1351 not applied to 0.233.1354: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1351 -> 0.233.1352 not applied to 0.233.1354: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1352 -> 0.233.1353 not applied to 0.233.1354: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1353 -> 0.233.1354 not applied to 0.233.1354: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 491 
>> for last-failure-Cachier=1419729443 failed: Application of an update diff 
>> failed
>> Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 494 
>> for fail-count-Cachier=1 failed: Application of an update diff failed
>> Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 497 
>> for probe_complete=true failed: Application of an update diff failed
>> Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 500 
>> for last-failure-Cachier=1419729443 failed: Application of an update diff 
>> failed
>> Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 503 
>> for fail-count-Cachier=1 failed: Application of an update diff failed
>> Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1338 -> 0.233.1339 not applied to 0.233.1382: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1339 -> 0.233.1340 not applied to 0.233.1382: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1340 -> 0.233.1341 not applied to 0.233.1382: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1341 -> 0.233.1342 not applied to 0.233.1382: current "num_updates" is 
>> greater than required
>> Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 
>> 0.233.1342 -> 0.233.1343 not applied to 0.233.1382: current "num_updates" is 
>> greater than required
>> 
>> After exploring corosync processes with ps I found out that on all my nodes 
>> there are zombie corosync procs like:
>> 
>> root     13892  0.0  0.0      0     0 ?        Z    Dec26   0:04 [corosync] 
>> <defunct>
>> root     21793  0.0  0.0      0     0 ?        Z    Dec26   0:00 [corosync] 
>> <defunct>
>> root     27009  1.3  1.0 714292 10784 ?        Ssl  Dec18 223:38 
>> /usr/sbin/corosync
>> 
>> Is it ok to have zombie corosync procs on nodes? Or does it suggest that 
>> something wrong is going on ? 
>> 
>> Thanks in advance
>> 
>> --
>> Best regards,
>> Sergey Arlashin
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Corosync 1.4.7: zombie (defunct)

Reply via email to