I wrote:
> E.g. on
> 8-core nodes, if you submit a 16-process job, there are four cores left
> over on the relevant nodes which might get something else scheduled on
> them.
Of course, that doesn't make much sense because I thought `12' and typed
`16' for some reason... Thanks to Rolf for off-li
Rolf Vandevaart writes:
> No, orte_leave_session_attached is needed to avoid the errno=2 errors
> from the sm btl. (It is fixed in 1.3.2 and trunk)
[It does cause other trouble, but I forget what the exact behaviour was
when I lost it as a default.]
>> Yes, but there's a problem with the recomm
Thanks.
$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
/opt/openmpi-gcc/bin/mpirun --display-allocation --display-map -v -np
$NSLOTS --host node0001,node0002 hostname
$ cat HPL_8cpu_GB.o46
== ALLOCATED NODES
Thanks. I've tried your suggestion.
$ cat hpl-8cpu-test.sge
#!/bin/bash
#
#$ -N HPL_8cpu_GB
#$ -pe orte 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
/opt/openmpi-gcc/bin/mpirun -mca ras_gridengine_verbose 100 -v -np $NSLOTS
--host node0001,node0002 hostname
It allocated 2 nodes to run, however all
Rolf has correctly reminded me that display-allocation occurs prior to
host filtering, so you will see all of the allocated nodes. You'll see
the impact of the host specifications in display-map,
Sorry for the confusion - thanks to Rolf for pointing it out.
Ralph
On Apr 1, 2009, at 7:40 AM,
As an FYI: you can debug allocation issues more easily by:
mpirun --display-allocation --do-not-launch -n 1 foo
This will read the allocation, do whatever host filtering you specify
with -host and -hostfile options, report out the result, and then
terminate without trying to launch anything.
It turns out that the use of --host and --hostfile act as a filter of
which nodes to run on when you are running under SGE. So, listing them
several times does not affect where the processes land. However, this
still does not explain why you are seeing what you are seeing. One
thing you can
Dear Rolf,
Thanks for your reply.
I've created another PE and changed the submission script, explicitly
specify the hostname with "--host".
However the result is the same.
# qconf -sp orte
pe_nameorte
slots 8
user_lists NONE
xuser_listsNONE
start_proc_args
On 03/31/09 14:50, Dave Love wrote:
Rolf Vandevaart writes:
However, I found that if I explicitly specify the "-machinefile
$TMPDIR/machines", all 8 mpi processes were spawned within a single
node, i.e. node0002.
I had that sort of behaviour recently when the tight integration was
broken on
Rolf Vandevaart writes:
>> However, I found that if I explicitly specify the "-machinefile
>> $TMPDIR/machines", all 8 mpi processes were spawned within a single
>> node, i.e. node0002.
I had that sort of behaviour recently when the tight integration was
broken on the installation we'd been give
On 03/31/09 11:43, PN wrote:
Dear all,
I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
I have 2 compute nodes for testing, each node has a single quad core CPU.
Here is my submission script and PE config:
$ cat hpl-8cpu.sge
#!/bin/bash
#
#$ -N HPL_8cpu_IB
#$ -pe mpi-fu 8
#$ -cwd
#$ -j y
#$
Dear all,
I'm using Open MPI 1.3.1 and SGE 6.2u2 on CentOS 5.2
I have 2 compute nodes for testing, each node has a single quad core CPU.
Here is my submission script and PE config:
$ cat hpl-8cpu.sge
#!/bin/bash
#
#$ -N HPL_8cpu_IB
#$ -pe mpi-fu 8
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#
cd /home/
12 matches
Mail list logo