Hi,
I have a corosync/pacemaker cluster running on Ubuntu 10.04.2. The
following error is getting appended to the syslog:
Dec 6 20:44:46 filer-1 crmd: [2970]: ERROR: socket_client_channel_new:
socket: Too many open files
Dec 6 20:44:46 filer-1 crmd: [2970]: ERROR:
init_client_ipc_comms_nodispatch: Could not access channel on:
/var/run/crm/pengine
Dec 6 20:44:46 filer-1 crmd: [2970]: WARN: do_pe_control: Setup of
client connection failed, not adding channel to mainloop
Dec 6 20:44:46 filer-1 crmd: [2970]: WARN: do_log: FSA: Input I_FAIL
from do_pe_control() received in state S_INTEGRATION
Dec 6 20:44:46 filer-1 crmd: [2970]: info: do_dc_join_offer_all:
join-24: Waiting on 2 outstanding join acks
Dec 6 20:44:46 filer-1 crmd: [2970]: info: do_dc_takeover: Taking over
DC status for this partition
root@filer-1:~# lsof -p `pidof crmd` | grep socket | wc -l
1019
root@filer-1:~# cat /proc/2970/limits | grep 'open files'
Max open files 1024 1024
files
I almost fainted when I saw this one :)
crm(live)# status
============
Last updated: Fri Dec 7 06:38:48 2012
Stack: openais
Current DC: filer-1 - partition with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
11 Resources configured.
============
OFFLINE: [ filer-2 filer-1 ]
As far as I'm concerned killall -9 crmd will release used FDs. Does
anyone has any idea how this will work? I tested killing crmd on another
cluster (without this problem) and all resources were migrated to second
node. What can possibly happen in this case where cluster communication
is busted? Anyone ever dealt with similar problem? Resources are
currently running on filer-1, a node which had been MASTER nefore this
problem occurred.
Packages:
pacemaker - Version: 1.0.8+hg15494-2ubuntu2
corosync - Version: 1.2.0-0ubuntu1
cluster-glue - Version: 1.0.5-1
libcorosync4 - Version: 1.2.0-0ubuntu1
libheartbeat2 - Version: 1:3.0.3-1ubuntu1
Any help/advice would be really appreciated :)
--
--
Piotr Jewiec
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org