Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-02 Thread Dave Love
I wrote: > E.g. on > 8-core nodes, if you submit a 16-process job, there are four cores left > over on the relevant nodes which might get something else scheduled on > them. Of course, that doesn't make much sense because I thought `12' and typed `16' for some reason... Thanks to Rolf for off-li

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Dave Love
Rolf Vandevaart writes: > No, orte_leave_session_attached is needed to avoid the errno=2 errors > from the sm btl. (It is fixed in 1.3.2 and trunk) [It does cause other trouble, but I forget what the exact behaviour was when I lost it as a default.] >> Yes, but there's a problem with the recomm

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread PN
Thanks. $ cat hpl-8cpu-test.sge #!/bin/bash # #$ -N HPL_8cpu_GB #$ -pe orte 8 #$ -cwd #$ -j y #$ -S /bin/bash #$ -V # /opt/openmpi-gcc/bin/mpirun --display-allocation --display-map -v -np $NSLOTS --host node0001,node0002 hostname $ cat HPL_8cpu_GB.o46 == ALLOCATED NODES

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread PN
Thanks. I've tried your suggestion. $ cat hpl-8cpu-test.sge #!/bin/bash # #$ -N HPL_8cpu_GB #$ -pe orte 8 #$ -cwd #$ -j y #$ -S /bin/bash #$ -V # /opt/openmpi-gcc/bin/mpirun -mca ras_gridengine_verbose 100 -v -np $NSLOTS --host node0001,node0002 hostname It allocated 2 nodes to run, however all

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Ralph Castain
Rolf has correctly reminded me that display-allocation occurs prior to host filtering, so you will see all of the allocated nodes. You'll see the impact of the host specifications in display-map, Sorry for the confusion - thanks to Rolf for pointing it out. Ralph On Apr 1, 2009, at 7:40 AM,

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Ralph Castain
As an FYI: you can debug allocation issues more easily by: mpirun --display-allocation --do-not-launch -n 1 foo This will read the allocation, do whatever host filtering you specify with -host and -hostfile options, report out the result, and then terminate without trying to launch anything.

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Rolf Vandevaart
It turns out that the use of --host and --hostfile act as a filter of which nodes to run on when you are running under SGE. So, listing them several times does not affect where the processes land. However, this still does not explain why you are seeing what you are seeing. One thing you can

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-03-31 Thread PN
Dear Rolf, Thanks for your reply. I've created another PE and changed the submission script, explicitly specify the hostname with "--host". However the result is the same. # qconf -sp orte pe_nameorte slots 8 user_lists NONE xuser_listsNONE start_proc_args

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-03-31 Thread Rolf Vandevaart
On 03/31/09 14:50, Dave Love wrote: Rolf Vandevaart writes: However, I found that if I explicitly specify the "-machinefile $TMPDIR/machines", all 8 mpi processes were spawned within a single node, i.e. node0002. I had that sort of behaviour recently when the tight integration was broken on

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-03-31 Thread Dave Love
Rolf Vandevaart writes: >> However, I found that if I explicitly specify the "-machinefile >> $TMPDIR/machines", all 8 mpi processes were spawned within a single >> node, i.e. node0002. I had that sort of behaviour recently when the tight integration was broken on the installation we'd been give

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-03-31 Thread Rolf Vandevaart
On 03/31/09 11:43, PN wrote: Dear all, I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2 I have 2 compute nodes for testing, each node has a single quad core CPU. Here is my submission script and PE config: $ cat hpl-8cpu.sge #!/bin/bash # #$ -N HPL_8cpu_IB #$ -pe mpi-fu 8 #$ -cwd #$ -j y #$

[OMPI users] Strange behaviour of SGE+OpenMPI

2009-03-31 Thread PN
Dear all, I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2 I have 2 compute nodes for testing, each node has a single quad core CPU. Here is my submission script and PE config: $ cat hpl-8cpu.sge #!/bin/bash # #$ -N HPL_8cpu_IB #$ -pe mpi-fu 8 #$ -cwd #$ -j y #$ -S /bin/bash #$ -V # cd /home/