Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

Tom Bryan Mon, 6 Feb 2012 16:28:58 -0500

On 2/6/12 8:14 AM, "Reuti" <re...@staff.uni-marburg.de> wrote:


>> If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support,
>> it's not clear to me whether MPI::Init_Thread() and
>> MPI::Inint_Thread(MPI::THREAD_MULTIPLE) would give me the same behavior from
>> Open MPI.
> 
> If you need thread support, you will need MPI::Init_Thread and it needs one
> argument (or three).

Sorry, typo on my side.  I meant to compare
MPI::Init_thread(MPI::THREAD_MULTIPLE) and MPI::Init().  I think that your
first reply mentioned replacing MPI::Init_thread by MPI::Init.

> I suggest to use a stable version 1.4.4 for your experiments. As you said you
> are new MPI, you could get misled between wrong error messages and bugs and
> error messages due to a programming error on your side.

OK.  I'll certainly set it up so that I can validate what's supposed to
work.  I'll have to check with our main MPI developers to see whether
there's anything in 1.5.x that they need.

>> 1. I'm still surprised that the SGE behavior is so different when I
>> configure my SGE queue differently.  See test "a" in the .tgz.  When I just
>> run mpitest in mpi.sh and ask for exactly 5 slots (-pe orte 5-5), it works
>> if the queue is configured to use a single host.  I see 1 MASTER and 4
>> SLAVES in qstat -g t, and I get the correct output.
> 
> Fine. ("job_is_first_task true" in the PE according to this.)

Yes, I believe that job_is_first_task will need to be true for our
environment.

>>  If the queue is set to
>> use multiple hosts, the jobs hang in spawn/init, and I get errors
>> [grid-03.cisco.com][[19159,2],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint
>> _complete_connect] connect() to 192.168.122.1 failed: Connection refused
>> (111)
> 
> What is the setting in SGE for:
> 
> $ qconf -sconf
> ...
> qlogin_command               builtin
> qlogin_daemon                builtin
> rlogin_command               builtin
> rlogin_daemon                builtin
> rsh_command                  builtin
> rsh_daemon                   builtin
> If it's set to use ssh,

Nope.  My output is the same as yours.
qlogin_command               builtin
qlogin_daemon                builtin
rlogin_command               builtin
rlogin_daemon                builtin
rsh_command                  builtin
rsh_daemon                   builtin


> But I wonder, why it's working for some nodes?

I don't think that it's working on some nodes.  In my other cases where it
hangs, I don't always get those "connection refused" errors.

I'm not sure, but the "connection refused" errors might be a red herring.
The machines' primary NICs are on a different private network (172.28.*.*).
The 192.168.122.1 address is actually the machine's own virbr0 device, which
the documentations says is a "xen interface used by Virtualization guest and
host oses for network communication."

> Are there custom configuration per node, and some are faulty:

I did a qconf -sconf machine for each host in my grid.  I get identical
output like this for each machine.
$ qconf -sconf grid-03
#grid-03.cisco.com:
mailer                       /bin/mail
xterm                        /usr/bin/xterm

So, I think that the SGE config is the same across those machines.

>> 2. I guess I'm not sure how SGE is supposed to behave.  Experiment "a" and
>> "b" were identical except that I changed -pe orte 5-5 to -pe orte 5-.  The
>> single case works like before, and the multiple exec host case fails as
>> before.  The difference is that qstat -g t shows additional SLAVEs that
>> don't seem to correspond to any jobs on the exec hosts.  Are these SLAVEs
>> just slots that are reserved for my job but that I'm not using?  If my job
>> will only use 5 slots, then I should set the SGE qsub job to ask for exactly
>> 5 with "-pe orte 5-5", right?
> 
> Correct. The remaining ones are just unused. You could adjust your application
> of course to check how many slots were granted, and start slaves according to
> the information you got to use all granted slots.

OK.  That makes sense.  In our intended uses, I believe that we'll know
exactly how many slots the application will need, and it will use the same
number of slots throughout the entire job.

>> 3. Experiment "d" was similar to "b", but I use mpi.sh uses "mpiexec -np 1
>> mpitest" instead of running mpitest directly.  Now both the single machine
>> queue and multiple machine queue work.  So, mpiexec seems to make my
>> multi-machine configuration happier.  In this case, I'm still using "-pe
>> orte 5-", and I'm still seeing the extra SLAVE slots granted in qstat -g t.
> 
> Then case a) could show a bug in 1.5.4. For me both we working, but the

OK.  That helps to explain my confusion.  Our previous experiments (where I
was told that case (a) was working) were with Open MPI 1.4.x.  Should I open
a bug for this issue?

> allocation was different. The correct allocation I only got with "mpiexec -np
> 1". In your case 4 were routed to one remote machine: the machine where the
> jobscript runs is usually the first entry in the machinefile, but on grid-03
> you got only one slot by accident, and so the 4 additional ones were routed to
> the next machine it found in the machinefile.

FYI, I think that this particular allocation was a mis-configuration on my
side.  It looks like SGE thinks that grid-03 only has 1 slot available.

>> 4. Based on "d", I thought that I could follow the approach in "a".  That
>> is, for experiment "e", I used mpiexec -np 1, but I also used -pe orte 5-5.
>> I thought that this would make the multi-machine queue reserve only the 5
>> slots that I needed.  The single machine queue works correctly, but now the
>> multi-machine case hangs with no errors.  The output from qstat and pstree
>> are what I'd expect, but it seems to hang in Span_multiple and Init_thread.
>> I really expected this to work.
> 
> Yes, this should work across multiple machines. And it's using `qrsh -inherit
> ...` so it's failing somewhere in Open MPI - is it working with 1.4.4?

I'm not sure.  We no longer have our 1.4 test environment, so I'm in the
process of building that now.  I'll let you know once I have a chance to run
that experiment.

Thanks,
---Tom

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

Reply via email to