I remember years ago, OpenMPI (version 1.3.3) required the hard/soft open files limits be >= 4096 in order to function when large numbers of slots were requested (with 1.3.3 this was at roughly 85 slots). Is this requirement still present for OpenMPI versions 1.10.1 and greater?
I'm having some issues now with OpenMPI version 1.10.1 that remind me of the issues I had w/1.3.3 where OpenMPI worked fine as long as I don't request too many slots. When I look at the ulimits -a (soft limit) I see: open files (-n) 1024 Ulimits -Ha (hard limit) gives: open files (-n) 4096 I'm getting errors of the form: [csclprd3-5-5:15248] [[40732,0],0] plm:base:receive got update_proc_state for job [40732,1] [csclprd3-6-12:30567] *** Process received signal *** [csclprd3-6-12:30567] Signal: Bus error (7) [csclprd3-6-12:30567] Signal code: Non-existant physical address (2) [csclprd3-6-12:30567] Failing at address: 0x2b3d19f72000 [csclprd3-6-12:30568] *** Process received signal *** [csclprd3-6-12:30567] [ 0] /lib64/libpthread.so.0(+0xf500)[0x2b3d0f71f500] [csclprd3-6-12:30567] [ 1] /hpc/apps/mpi/openmpi/1.10.1/lib/openmpi/mca_shmem_mmap.so(+0x1524)[0x2b3d10cb0524] [csclprd3-6-12:30567] [ 2] /hpc/apps/mpi/openmpi/1.10.1/lib/openmpi/mca_btl_vader.so(+0x3674)[0x2b3d18494674] [csclprd3-6-12:30567] [ 3] /hpc/apps/mpi/openmpi/1.10.1/lib/libmpi.so.12(mca_btl_base_select+0x117)[0x2b3d0f4b0b07] [csclprd3-6-12:30567] [ 4] /hpc/apps/mpi/openmpi/1.10.1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x2b3d13d917b2] [csclprd3-6-12:30567] [ 5] /hpc/apps/mpi/openmpi/1.10.1/lib/libmpi.so.12(mca_bml_base_init+0x99)[0x2b3d0f4b0309] [csclprd3-6-12:30567] [ 6] /hpc/apps/mpi/openmpi/1.10.1/lib/openmpi/mca_pml_ob1.so(+0x538c)[0x2b3d18ac238c] [csclprd3-6-12:30567] [ 7] /hpc/apps/mpi/openmpi/1.10.1/lib/libmpi.so.12(mca_pml_base_select+0x1e0)[0x2b3d0f4c1780] [csclprd3-6-12:30567] [ 8] /hpc/apps/mpi/openmpi/1.10.1/lib/libmpi.so.12(ompi_mpi_init+0x51d)[0x2b3d0f47317d] [csclprd3-6-12:30567] [ 9] /hpc/apps/mpi/openmpi/1.10.1/lib/libmpi.so.12(MPI_Init+0x170)[0x2b3d0f492820] [csclprd3-6-12:30567] [10] /hpc/home/lanew/mpi/openmpi/a_1_10_1.out[0x400ad0] [csclprd3-6-12:30567] [11] /lib64/libc.so.6(__libc_start_main+0xfd)[0x2b3d0f94bcdd] [csclprd3-6-12:30567] [12] /hpc/home/lanew/mpi/openmpi/a_1_10_1.out[0x400999] [csclprd3-6-12:30567] *** End of error message *** Ugh. Bill L. IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is strictly prohibited. Thank you for your cooperation.