Hi,

I have a corosync/pacemaker cluster running on Ubuntu 10.04.2. The following error is getting appended to the syslog:

Dec 6 20:44:46 filer-1 crmd: [2970]: ERROR: socket_client_channel_new: socket: Too many open files Dec 6 20:44:46 filer-1 crmd: [2970]: ERROR: init_client_ipc_comms_nodispatch: Could not access channel on: /var/run/crm/pengine Dec 6 20:44:46 filer-1 crmd: [2970]: WARN: do_pe_control: Setup of client connection failed, not adding channel to mainloop Dec 6 20:44:46 filer-1 crmd: [2970]: WARN: do_log: FSA: Input I_FAIL from do_pe_control() received in state S_INTEGRATION Dec 6 20:44:46 filer-1 crmd: [2970]: info: do_dc_join_offer_all: join-24: Waiting on 2 outstanding join acks Dec 6 20:44:46 filer-1 crmd: [2970]: info: do_dc_takeover: Taking over DC status for this partition


root@filer-1:~# lsof -p `pidof crmd` | grep socket | wc -l
1019

root@filer-1:~# cat /proc/2970/limits | grep 'open files'
Max open files 1024 1024 files

I almost fainted when I saw this one :)

crm(live)# status
============
Last updated: Fri Dec  7 06:38:48 2012
Stack: openais
Current DC: filer-1 - partition with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
11 Resources configured.
============

OFFLINE: [ filer-2 filer-1 ]

As far as I'm concerned killall -9 crmd will release used FDs. Does anyone has any idea how this will work? I tested killing crmd on another cluster (without this problem) and all resources were migrated to second node. What can possibly happen in this case where cluster communication is busted? Anyone ever dealt with similar problem? Resources are currently running on filer-1, a node which had been MASTER nefore this problem occurred.

Packages:

pacemaker - Version: 1.0.8+hg15494-2ubuntu2
corosync - Version: 1.2.0-0ubuntu1
cluster-glue - Version: 1.0.5-1
libcorosync4 - Version: 1.2.0-0ubuntu1
libheartbeat2 - Version: 1:3.0.3-1ubuntu1

Any help/advice would be really appreciated :)
--
--
Piotr Jewiec

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to