Hi OpenMPI folks,We use Sun MPI (Cluster Tools 8.2) and also native OpenMPI 1.3.3 and we wonder us about the way OpenMPI devours file descriptors: on our computers, ulimit -n is currently set to 1024, and we found out that we may run maximally 84 MPI processes per box, and if we try to run 85 (or above) processes, we got such error message:
--------------------------------------------------------------------------Error: system limit exceeded on number of network connections that can be open
..... --------------------------------------------------------------------------Simple computing tells us, 1024/85 is about 12. This lets us believe that there is an single OpenMPI process, which needs 12 file descriptor per other MPI process.
By now, we have only one box with more than 100 CPUs on which it may be meaningfull to run more than 85 processes. But in the quite near future, many-core boxes are arising (we also ordered 128-way nehalems), so it may be disadvantageous to consume a lot of file descriptors per MPI process.
We see a possibility to awod this problem by setting the ulimit for file descriptor to a higher value. This is not easy unter linux: you need either to recompile the kernel (which is not a choise for us), or to set a root process somewhere which will set the ulimit to a higher value (which is a security risk and not easy to implement).
We also tryed to set the opal_set_max_sys_limits to 1, as the help says (by adding "-mca opal_set_max_sys_limits 1" to the command line), but we does not see any change of behaviour).
What is your meaning? Best regards, Paul Kapinos RZ RWTH Aachen #####################################################/opt/SUNWhpc/HPC8.2/intel/bin/mpiexec -mca opal_set_max_sys_limits 1 -np 86 a.out
<<attachment: kapinos.vcf>>
smime.p7s
Description: S/MIME Cryptographic Signature