[OMPI users] mpi problems/many cpus per node

Daniel Davidson Fri, 14 Dec 2012 15:29:48 -0500

I have had to cobble together two machines in our rocks cluster withoutusing the standard installation, they have efi only bios on them androcks doesnt like that, so it is the only workaround.

Everything works great now, except for one thing. MPI jobs (openmpi ormpich) fail when started from one of these nodes (via qsub or by loggingin and running the command) if 24 or more processors are needed onanother system. However if the originator of the MPI job is theheadnode or any of the preexisting compute nodes, it works fine. Rightnow I am guessing ssh client or ulimit problems, but I cannot find anydifference. Any help would be greatly appreciated.


compute-2-1 and compute-2-0 are the new nodes

Examples:

This works, prints 23 hostnames from each machine:

[root@compute-2-1 ~]# /home/apps/openmpi-1.6.3/bin/mpirun -hostcompute-2-0,compute-2-1 -np 46 hostname


This does not work, prints 24 hostnames for compute-2-1

[root@compute-2-1 ~]# /home/apps/openmpi-1.6.3/bin/mpirun -hostcompute-2-0,compute-2-1 -np 48 hostname


These both work, print 64 hostnames from each node

[root@biocluster ~]# /home/apps/openmpi-1.6.3/bin/mpirun -hostcompute-2-0,compute-2-1 -np 128 hostname[root@compute-0-2 ~]# /home/apps/openmpi-1.6.3/bin/mpirun -hostcompute-2-0,compute-2-1 -np 128 hostname


[root@compute-2-1 ~]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 16410016
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

[root@compute-2-1 ~]# more /etc/ssh/ssh_config
Host *
        CheckHostIP             no
        ForwardX11              yes
        ForwardAgent            yes
        StrictHostKeyChecking   no
        UsePrivilegedPort       no
        Protocol                2,1

[OMPI users] mpi problems/many cpus per node

Reply via email to