Am 06.02.2012 um 22:28 schrieb Tom Bryan:

> On 2/6/12 8:14 AM, "Reuti" <re...@staff.uni-marburg.de> wrote:
> 
>>> If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support,
>>> it's not clear to me whether MPI::Init_Thread() and
>>> MPI::Inint_Thread(MPI::THREAD_MULTIPLE) would give me the same behavior from
>>> Open MPI.
>> 
>> If you need thread support, you will need MPI::Init_Thread and it needs one
>> argument (or three).
> 
> Sorry, typo on my side.  I meant to compare
> MPI::Init_thread(MPI::THREAD_MULTIPLE) and MPI::Init().  I think that your
> first reply mentioned replacing MPI::Init_thread by MPI::Init.

Yes, if you don't need threads, I don't see any reason why it should add 
anything to the environment what you could make use of.


>>> <snip>
>> 
>> What is the setting in SGE for:
>> 
>> $ qconf -sconf
>> ...
>> qlogin_command               builtin
>> qlogin_daemon                builtin
>> rlogin_command               builtin
>> rlogin_daemon                builtin
>> rsh_command                  builtin
>> rsh_daemon                   builtin
>> If it's set to use ssh,
> 
> Nope.  My output is the same as yours.
> qlogin_command               builtin
> qlogin_daemon                builtin
> rlogin_command               builtin
> rlogin_daemon                builtin
> rsh_command                  builtin
> rsh_daemon                   builtin

Fine.


>> But I wonder, why it's working for some nodes?
> 
> I don't think that it's working on some nodes.  In my other cases where it
> hangs, I don't always get those "connection refused" errors.

If "builtin" is used, there is no reason to get "connection refused". The error 
message from Open MPI should be different in case of a closed firewall IIRC.


> I'm not sure, but the "connection refused" errors might be a red herring.
> The machines' primary NICs are on a different private network (172.28.*.*).
> The 192.168.122.1 address is actually the machine's own virbr0 device, which
> the documentations says is a "xen interface used by Virtualization guest and
> host oses for network communication."

By default Open MPI is using the primary interface for its communication AFAIK.


>> Are there custom configuration per node, and some are faulty:
> 
> I did a qconf -sconf machine for each host in my grid.  I get identical
> output like this for each machine.
> $ qconf -sconf grid-03
> #grid-03.cisco.com:
> mailer                       /bin/mail
> xterm                        /usr/bin/xterm
> 
> So, I think that the SGE config is the same across those machines.

Yes, ok. Then it's fine.


>>> <snip>
>>> 3. Experiment "d" was similar to "b", but I use mpi.sh uses "mpiexec -np 1
>>> mpitest" instead of running mpitest directly.  Now both the single machine
>>> queue and multiple machine queue work.  So, mpiexec seems to make my
>>> multi-machine configuration happier.  In this case, I'm still using "-pe
>>> orte 5-", and I'm still seeing the extra SLAVE slots granted in qstat -g t.
>> 
>> Then case a) could show a bug in 1.5.4. For me both we working, but the
> 
> OK.  That helps to explain my confusion.  Our previous experiments (where I
> was told that case (a) was working) were with Open MPI 1.4.x.  Should I open
> a bug for this issue?

I'm not sure, as for me it's working. Maybe it has really something to do with 
the virtual machines setup.


>> Yes, this should work across multiple machines. And it's using `qrsh -inherit
>> ...` so it's failing somewhere in Open MPI - is it working with 1.4.4?
> 
> I'm not sure.  We no longer have our 1.4 test environment, so I'm in the
> process of building that now.  I'll let you know once I have a chance to run
> that experiment.

Ok.

-- Reuti

Reply via email to