Re: [OMPI users] Open MPI-1.5.4 --with-mpi-f90-size=large Compilation Error

2012-02-06 Thread Jeff Squyres
This is a known problem and is unlikely to be fixed.  The solution is simply to 
use the medium size f90 module, which means you won't have strict type checking 
on all MPI functions that take 2 choice buffers (e.g., MPI_SCATTERV).

In the OMPI v1.7 series, we have a wholly revamped set of Fortran support 
coming.  Unfortunately, it won't work for gfortran because of lack of features 
in that compiler.  For (current versions of) gfortran, the level of fortran 
support is staying the same in Open MPI v1.7 (i.e., "sizes" and no "large" 
size).  :-(


On Feb 4, 2012, at 10:56 PM, Rashid, Z. (Zahid) wrote:

> Dear Open MPI users,
> 
> I want to compile Open MPI-1.5.4 beta on my macbook pro (with 
> GCC-4.6.2/Gfortran-4.6.2 installed) with the option; " configure 
> --with-mpi-f90-size=large ". The configuration script runs ok but during 
> compilation I get the following warnings which after a limit of 25 turn into 
> an error.
> 
>   FC mpi_scatterv_f90.lo
> mpi_scatterv_f90.f90:17.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:55.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
>   FC mpi_sendrecv_f90.lo
> mpi_scatterv_f90.f90:93.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:131.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:169.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:207.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:245.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:283.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:321.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:359.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:397.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:435.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:473.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:511.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:549.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:587.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:625.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:663.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:701.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:739.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Error: Unterminated character constant beginning at (1)
> mpi_scatterv_f90.f90:777.12:
> 
>   print *, "Open MPI WARNING: You are calling MPI_SCATTERV with incorrect 
> sendc
> 1
> Er

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-06 Thread Reuti
Am 04.02.2012 um 00:15 schrieb Tom Bryan:

> OK. I misunderstood you.  I thought that you were saying that spawn_multiple
> had to call mpiexec for each spawned process.  If you just meant that mpi.sh
> should launch the initial process with mpiexec, that seems reasonable.  I
> tried it with and without, and I definitely get better results when using
> mpiexec.  

Yep.


> If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support,
> it's not clear to me whether MPI::Init_Thread() and
> MPI::Inint_Thread(MPI::THREAD_MULTIPLE) would give me the same behavior from
> Open MPI.

If you need thread support, you will need MPI::Init_Thread and it needs one 
argument (or three).

The 2.2 standard states it:

http://www.mpi-forum.org/docs/

page 384.


 NB: What is MPI::Init( MPI::THREAD_MULTIPLE ) supposed to do, output a
 feature of MPI?
> 
>> From the man page:
> MPI_Init_thread,  as compared to MPI_Init, has a provision to request a
> certain level of thread support in requiredThe  level of thread support
> available to the program is set in provided, except in C++, where it is the
> return value of the function.
> 
>> For me it's not hanging. Did you try the alternative startup using mpiexec?
>> Aha - BTW: I use 1.4.4
> 
> Right, I'm on 1.5.4.

I suggest to use a stable version 1.4.4 for your experiments. As you said you 
are new MPI, you could get misled between wrong error messages and bugs and 
error messages due to a programming error on your side.


> Yes, I did try starting with mpiexec.  That helps, but I still don't know
> whether I understand all of the results.
> 
> For each experiment, I've attached the output of
> qfstat -f 
> qfstat -g t
> pstree -Aalp 
> output of mpitest parent and children (mpi.sh.o)
> 
> I ran each test with two different SGE queue configurations.  In one case,
> the queue with the orte pe is set to include all 5 exec hosts in my gird.
> In the "single" case, the queue with the orte pe is set to use only a single
> host.  (The queue configuration isn't shown here, but I changed the queue's
> hostlist to user either a single host or a host group that includes all of
> my machines.
> 
> I run qsub on node 17.  The grid machines available for this run are 3, 4,
> 10, 11, and 16.
> 
> Some observations:
> 
> 1. I'm still surprised that the SGE behavior is so different when I
> configure my SGE queue differently.  See test "a" in the .tgz.  When I just
> run mpitest in mpi.sh and ask for exactly 5 slots (-pe orte 5-5), it works
> if the queue is configured to use a single host.  I see 1 MASTER and 4
> SLAVES in qstat -g t, and I get the correct output.

Fine. ("job_is_first_task true" in the PE according to this.)


>  If the queue is set to
> use multiple hosts, the jobs hang in spawn/init, and I get errors
> [grid-03.cisco.com][[19159,2],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint
> _complete_connect] connect() to 192.168.122.1 failed: Connection refused
> (111)

What is the setting in SGE for:

$ qconf -sconf
...
qlogin_command   builtin
qlogin_daemonbuiltin
rlogin_command   builtin
rlogin_daemonbuiltin
rsh_command  builtin
rsh_daemon   builtin

If it's set to use ssh, you will need a passphrase-less login to other nodes or 
(better) a hostbased authentication (as it's a one time setup for all users in 
the future):

http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html

But I wonder, why it's working for some nodes? Are there custom configuration 
per node, and some are faulty:

$ qconf -sconfl

And then you can check for each listed one:

$ qconf -sconf grid-04

and so on.

In case you are interested in the meaning and behavior behind these settings:

http://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html


> [grid-10.cisco.com:05327] [[19159,0],3] routed:binomial: Connection to
> lifeline [[19159,0],0] lost
> [grid-16.cisco.com:25196] [[19159,0],1] routed:binomial: Connection to
> lifeline [[19159,0],0] lost
> [grid-11.cisco.com:63890] [[19159,0],2] routed:binomial: Connection to
> lifeline [[19159,0],0] lost
> So, I'll just assume that mpiexec does some magic that is needed in the
> multi-machine scenario but not in the single machine scenario.
> 
> 2. I guess I'm not sure how SGE is supposed to behave.  Experiment "a" and
> "b" were identical except that I changed -pe orte 5-5 to -pe orte 5-.  The
> single case works like before, and the multiple exec host case fails as
> before.  The difference is that qstat -g t shows additional SLAVEs that
> don't seem to correspond to any jobs on the exec hosts.  Are these SLAVEs
> just slots that are reserved for my job but that I'm not using?  If my job
> will only use 5 slots, then I should set the SGE qsub job to ask for exactly
> 5 with "-pe orte 5-5", right?

Correct. The remaining ones are just unused. You could adjust your application 
of course to check how many slots were granted, and start slaves according

Re: [OMPI users] MPI_Barrier, again

2012-02-06 Thread Evgeniy Shapiro
P.S.  I have tested with OpenMPI 1.4.5rc4 and the problem is still there.

Evgeniy


Re: [OMPI users] IO performance

2012-02-06 Thread Rob Latham
On Fri, Feb 03, 2012 at 10:46:21AM -0800, Tom Rosmond wrote:
> With all of this, here is my MPI related question.  I recently added an
> option to use MPI-IO to do the heavy IO lifting in our applications.  I
> would like to know what the relative importance of the dedicated MPI
> network vis-a-vis the GPFS network for typical MPIIO collective reads
> and writes.  I assume there must be some hand-off of data between the
> networks during the process, but how is it done, and are there any rules
> to help understand it.  Any insights would be welcome.

There's not really a handoff.  MPI-IO on GPFS will call a posix read()
or write() system call after possibly doing some data massaging.  That
system call sends data over the storage network.

If you've got a fast communication network but a slow storage network,
then some of the MPI-IO optimizations will need to be adjusted a bit.
Seems like you'd want to really beef up the "cb_buffer_size".

For GPFS, the big thing MPI-IO can do for you is align writes to
GPFS.  see my next point.

> P.S.  I am running with Open-mpi 1.4.2.

If you upgrade to something in the 1.5 series you will get some nice
ROMIO optimizations that will help you out with writes to GPFS if 
you set the "striping_unit" hint to the GPFS block size.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


Re: [OMPI users] IO performance

2012-02-06 Thread Tom Rosmond
Rob

Thanks, these are the kind of suggestions I was looking for.  I will try
them.  But I will have to twist some arms to get the 1.5 upgrade.  I
might just install a private copy for my tests.

T. Rosmond


On Mon, 2012-02-06 at 10:21 -0600, Rob Latham wrote:
> On Fri, Feb 03, 2012 at 10:46:21AM -0800, Tom Rosmond wrote:
> > With all of this, here is my MPI related question.  I recently added an
> > option to use MPI-IO to do the heavy IO lifting in our applications.  I
> > would like to know what the relative importance of the dedicated MPI
> > network vis-a-vis the GPFS network for typical MPIIO collective reads
> > and writes.  I assume there must be some hand-off of data between the
> > networks during the process, but how is it done, and are there any rules
> > to help understand it.  Any insights would be welcome.
> 
> There's not really a handoff.  MPI-IO on GPFS will call a posix read()
> or write() system call after possibly doing some data massaging.  That
> system call sends data over the storage network.
> 
> If you've got a fast communication network but a slow storage network,
> then some of the MPI-IO optimizations will need to be adjusted a bit.
> Seems like you'd want to really beef up the "cb_buffer_size".
> 
> For GPFS, the big thing MPI-IO can do for you is align writes to
> GPFS.  see my next point.
> 
> > P.S.  I am running with Open-mpi 1.4.2.
> 
> If you upgrade to something in the 1.5 series you will get some nice
> ROMIO optimizations that will help you out with writes to GPFS if 
> you set the "striping_unit" hint to the GPFS block size.
> 
> ==rob
> 



Re: [OMPI users] IO performance

2012-02-06 Thread Richard Walsh

Tom/All,

In case it is not already obvious, the GPFS Linux kernel module
takes care of the interaction between the Linux IO stack, POSIX
and the GPFS under layer.  MPI-IO interacts with the thusly modified
kernel through the POSIX API.

Another item that is perhaps slightly off topic, but is something that
provides a nice overview of some basic GPFS concepts and compares it
to Lustre.  It describes the mixed Lustre and GPFS storage architecture
in use at NERSC.

Hope you find it useful:

http://www.cug.org/5-publications/proceedings_attendee_lists/CUG09CD/S09_Proceedings/pages/authors/01-5Monday/3A-Canon/canon-paper.pdf

Cheers,

rbw

Richard Walsh
Parallel Applications and Systems Manager
CUNY HPC Center, Staten Island, NY
W: 718-982-3319
M: 612-382-4620

Miracles are delivered to order by great intelligence, or when it is
absent, through the passage of time and a series of mere chance
events. -- Max Headroom


From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Tom 
Rosmond [rosm...@reachone.com]
Sent: Monday, February 06, 2012 11:39 AM
To: Open MPI Users
Subject: Re: [OMPI users] IO performance

Rob

Thanks, these are the kind of suggestions I was looking for.  I will try
them.  But I will have to twist some arms to get the 1.5 upgrade.  I
might just install a private copy for my tests.

T. Rosmond


On Mon, 2012-02-06 at 10:21 -0600, Rob Latham wrote:
> On Fri, Feb 03, 2012 at 10:46:21AM -0800, Tom Rosmond wrote:
> > With all of this, here is my MPI related question.  I recently added an
> > option to use MPI-IO to do the heavy IO lifting in our applications.  I
> > would like to know what the relative importance of the dedicated MPI
> > network vis-a-vis the GPFS network for typical MPIIO collective reads
> > and writes.  I assume there must be some hand-off of data between the
> > networks during the process, but how is it done, and are there any rules
> > to help understand it.  Any insights would be welcome.
>
> There's not really a handoff.  MPI-IO on GPFS will call a posix read()
> or write() system call after possibly doing some data massaging.  That
> system call sends data over the storage network.
>
> If you've got a fast communication network but a slow storage network,
> then some of the MPI-IO optimizations will need to be adjusted a bit.
> Seems like you'd want to really beef up the "cb_buffer_size".
>
> For GPFS, the big thing MPI-IO can do for you is align writes to
> GPFS.  see my next point.
>
> > P.S.  I am running with Open-mpi 1.4.2.
>
> If you upgrade to something in the 1.5 series you will get some nice
> ROMIO optimizations that will help you out with writes to GPFS if
> you set the "striping_unit" hint to the GPFS block size.
>
> ==rob
>

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Change is in the Air - Smoking in Designated Areas Only in 
effect.
Tobacco-Free Campus as of July 1, 2012.



Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-06 Thread Tom Bryan
On 2/6/12 8:14 AM, "Reuti"  wrote:

>> If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support,
>> it's not clear to me whether MPI::Init_Thread() and
>> MPI::Inint_Thread(MPI::THREAD_MULTIPLE) would give me the same behavior from
>> Open MPI.
> 
> If you need thread support, you will need MPI::Init_Thread and it needs one
> argument (or three).

Sorry, typo on my side.  I meant to compare
MPI::Init_thread(MPI::THREAD_MULTIPLE) and MPI::Init().  I think that your
first reply mentioned replacing MPI::Init_thread by MPI::Init.

> I suggest to use a stable version 1.4.4 for your experiments. As you said you
> are new MPI, you could get misled between wrong error messages and bugs and
> error messages due to a programming error on your side.

OK.  I'll certainly set it up so that I can validate what's supposed to
work.  I'll have to check with our main MPI developers to see whether
there's anything in 1.5.x that they need.

>> 1. I'm still surprised that the SGE behavior is so different when I
>> configure my SGE queue differently.  See test "a" in the .tgz.  When I just
>> run mpitest in mpi.sh and ask for exactly 5 slots (-pe orte 5-5), it works
>> if the queue is configured to use a single host.  I see 1 MASTER and 4
>> SLAVES in qstat -g t, and I get the correct output.
> 
> Fine. ("job_is_first_task true" in the PE according to this.)

Yes, I believe that job_is_first_task will need to be true for our
environment.

>>  If the queue is set to
>> use multiple hosts, the jobs hang in spawn/init, and I get errors
>> [grid-03.cisco.com][[19159,2],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint
>> _complete_connect] connect() to 192.168.122.1 failed: Connection refused
>> (111)
> 
> What is the setting in SGE for:
> 
> $ qconf -sconf
> ...
> qlogin_command   builtin
> qlogin_daemonbuiltin
> rlogin_command   builtin
> rlogin_daemonbuiltin
> rsh_command  builtin
> rsh_daemon   builtin
> If it's set to use ssh,

Nope.  My output is the same as yours.
qlogin_command   builtin
qlogin_daemonbuiltin
rlogin_command   builtin
rlogin_daemonbuiltin
rsh_command  builtin
rsh_daemon   builtin


> But I wonder, why it's working for some nodes?

I don't think that it's working on some nodes.  In my other cases where it
hangs, I don't always get those "connection refused" errors.

I'm not sure, but the "connection refused" errors might be a red herring.
The machines' primary NICs are on a different private network (172.28.*.*).
The 192.168.122.1 address is actually the machine's own virbr0 device, which
the documentations says is a "xen interface used by Virtualization guest and
host oses for network communication."

> Are there custom configuration per node, and some are faulty:

I did a qconf -sconf machine for each host in my grid.  I get identical
output like this for each machine.
$ qconf -sconf grid-03
#grid-03.cisco.com:
mailer   /bin/mail
xterm/usr/bin/xterm

So, I think that the SGE config is the same across those machines.

>> 2. I guess I'm not sure how SGE is supposed to behave.  Experiment "a" and
>> "b" were identical except that I changed -pe orte 5-5 to -pe orte 5-.  The
>> single case works like before, and the multiple exec host case fails as
>> before.  The difference is that qstat -g t shows additional SLAVEs that
>> don't seem to correspond to any jobs on the exec hosts.  Are these SLAVEs
>> just slots that are reserved for my job but that I'm not using?  If my job
>> will only use 5 slots, then I should set the SGE qsub job to ask for exactly
>> 5 with "-pe orte 5-5", right?
> 
> Correct. The remaining ones are just unused. You could adjust your application
> of course to check how many slots were granted, and start slaves according to
> the information you got to use all granted slots.

OK.  That makes sense.  In our intended uses, I believe that we'll know
exactly how many slots the application will need, and it will use the same
number of slots throughout the entire job.

>> 3. Experiment "d" was similar to "b", but I use mpi.sh uses "mpiexec -np 1
>> mpitest" instead of running mpitest directly.  Now both the single machine
>> queue and multiple machine queue work.  So, mpiexec seems to make my
>> multi-machine configuration happier.  In this case, I'm still using "-pe
>> orte 5-", and I'm still seeing the extra SLAVE slots granted in qstat -g t.
> 
> Then case a) could show a bug in 1.5.4. For me both we working, but the

OK.  That helps to explain my confusion.  Our previous experiments (where I
was told that case (a) was working) were with Open MPI 1.4.x.  Should I open
a bug for this issue?

> allocation was different. The correct allocation I only got with "mpiexec -np
> 1". In your case 4 were routed to one remote machine: the machine where the
> jobscript runs i

Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-06 Thread Reuti
Am 06.02.2012 um 22:28 schrieb Tom Bryan:

> On 2/6/12 8:14 AM, "Reuti"  wrote:
> 
>>> If I need MPI_THREAD_MULTIPLE, and openmpi is compiled with thread support,
>>> it's not clear to me whether MPI::Init_Thread() and
>>> MPI::Inint_Thread(MPI::THREAD_MULTIPLE) would give me the same behavior from
>>> Open MPI.
>> 
>> If you need thread support, you will need MPI::Init_Thread and it needs one
>> argument (or three).
> 
> Sorry, typo on my side.  I meant to compare
> MPI::Init_thread(MPI::THREAD_MULTIPLE) and MPI::Init().  I think that your
> first reply mentioned replacing MPI::Init_thread by MPI::Init.

Yes, if you don't need threads, I don't see any reason why it should add 
anything to the environment what you could make use of.


>>> 
>> 
>> What is the setting in SGE for:
>> 
>> $ qconf -sconf
>> ...
>> qlogin_command   builtin
>> qlogin_daemonbuiltin
>> rlogin_command   builtin
>> rlogin_daemonbuiltin
>> rsh_command  builtin
>> rsh_daemon   builtin
>> If it's set to use ssh,
> 
> Nope.  My output is the same as yours.
> qlogin_command   builtin
> qlogin_daemonbuiltin
> rlogin_command   builtin
> rlogin_daemonbuiltin
> rsh_command  builtin
> rsh_daemon   builtin

Fine.


>> But I wonder, why it's working for some nodes?
> 
> I don't think that it's working on some nodes.  In my other cases where it
> hangs, I don't always get those "connection refused" errors.

If "builtin" is used, there is no reason to get "connection refused". The error 
message from Open MPI should be different in case of a closed firewall IIRC.


> I'm not sure, but the "connection refused" errors might be a red herring.
> The machines' primary NICs are on a different private network (172.28.*.*).
> The 192.168.122.1 address is actually the machine's own virbr0 device, which
> the documentations says is a "xen interface used by Virtualization guest and
> host oses for network communication."

By default Open MPI is using the primary interface for its communication AFAIK.


>> Are there custom configuration per node, and some are faulty:
> 
> I did a qconf -sconf machine for each host in my grid.  I get identical
> output like this for each machine.
> $ qconf -sconf grid-03
> #grid-03.cisco.com:
> mailer   /bin/mail
> xterm/usr/bin/xterm
> 
> So, I think that the SGE config is the same across those machines.

Yes, ok. Then it's fine.


>>> 
>>> 3. Experiment "d" was similar to "b", but I use mpi.sh uses "mpiexec -np 1
>>> mpitest" instead of running mpitest directly.  Now both the single machine
>>> queue and multiple machine queue work.  So, mpiexec seems to make my
>>> multi-machine configuration happier.  In this case, I'm still using "-pe
>>> orte 5-", and I'm still seeing the extra SLAVE slots granted in qstat -g t.
>> 
>> Then case a) could show a bug in 1.5.4. For me both we working, but the
> 
> OK.  That helps to explain my confusion.  Our previous experiments (where I
> was told that case (a) was working) were with Open MPI 1.4.x.  Should I open
> a bug for this issue?

I'm not sure, as for me it's working. Maybe it has really something to do with 
the virtual machines setup.


>> Yes, this should work across multiple machines. And it's using `qrsh -inherit
>> ...` so it's failing somewhere in Open MPI - is it working with 1.4.4?
> 
> I'm not sure.  We no longer have our 1.4 test environment, so I'm in the
> process of building that now.  I'll let you know once I have a chance to run
> that experiment.

Ok.

-- Reuti


[OMPI users] O-MPI Support for Windows 7

2012-02-06 Thread James Torossian
Hi all,

I am trying to setup Open-MPI across two Windows 7 machines with UAC
disabled ..



Cygwin with OpenSSH is installed, and I can successfully ssh to each machine
without entry of username and password:



JimT@JimT-PC ~

$ ssh NanoOneQuad

Last login: Tue Feb  7 01:42:02 2012 from jimt-pc



JimT@NanoOneQuad ~

$



Regardless of this, mpirun insists on asking for a username and password;
then asks to save credentials, but if selected, responds with not
implemented. If saving credentials is not selected, then I can see that the
task starts on the other machine (in task manager) and that the task runs to
completion OK:



JimT@JimT-PC ~

$ mpirun -H NanoOneQuad ipconfig.exe

connecting to NanoOneQuad

username:JimT

password:**

Save Credential?(Y/N) y

[JimT-PC:03784] This feature hasn't been implemented yet.



JimT@JimT-PC ~

$ mpirun -H NanoOneQuad ipconfig.exe

connecting to NanoOneQuad

username:JimT

password:**

Save Credential?(Y/N) n



JimT@JimT-PC ~

$



Please let me know what I have missed. I have gone through the FAQs and have
rebuilt the windows version but can't seem to get past this.



Thanks and best regards,

Jim





Re: [OMPI users] O-MPI Support for heterogeneous (Windows / Linux) clusters

2012-02-06 Thread George Bosilca

On Feb 4, 2012, at 17:57 , Ralph Castain wrote:

>> Has it been possible to build O-MPI for Windows (with perhaps reduced 
>> capabilities) to use ssh under Cygwin rather than WMI? We are only after a 
>> small subset of MPI functionality.

You should be able to build Open MPI on Windows either under cygwin or mingw to 
only support ssh. In this case, however, ompi will behave as it's Unix 
counterpart, and no support for any Windows specifics (such as registry) will 
be enabled.

  george.



Re: [OMPI users] O-MPI Support for Windows 7

2012-02-06 Thread George Bosilca
James,

There is no mention about username or password in OMPI. I guess one of the 
applications used in the process, either the ssh or the ipconfig.exe is missing 
the current context to be executed without a higher level of credentials.

Can you execute ipconfig.exe once connected through ssh without having to 
provide the username and password? If yes, can you execute hostname through 
mpirun instead of ipconfig.exe?

  george.


On Feb 6, 2012, at 19:05 , James Torossian wrote:

> Hi all,
> I am trying to setup Open-MPI across two Windows 7 machines with UAC disabled 
> ……
>  
> Cygwin with OpenSSH is installed, and I can successfully ssh to each machine 
> without entry of username and password:
>  
> JimT@JimT-PC ~
> $ ssh NanoOneQuad
> Last login: Tue Feb  7 01:42:02 2012 from jimt-pc
>  
> JimT@NanoOneQuad ~
> $
>  
> Regardless of this, mpirun insists on asking for a username and password; 
> then asks to save credentials, but if selected, responds with not 
> implemented. If saving credentials is not selected, then I can see that the 
> task starts on the other machine (in task manager) and that the task runs to 
> completion OK:
>  
> JimT@JimT-PC ~
> $ mpirun -H NanoOneQuad ipconfig.exe
> connecting to NanoOneQuad
> username:JimT
> password:**
> Save Credential?(Y/N) y
> [JimT-PC:03784] This feature hasn't been implemented yet.
>  
> JimT@JimT-PC ~
> $ mpirun -H NanoOneQuad ipconfig.exe
> connecting to NanoOneQuad
> username:JimT
> password:**
> Save Credential?(Y/N) n
>  
> JimT@JimT-PC ~
> $
>  
> Please let me know what I have missed. I have gone through the FAQs and have 
> rebuilt the windows version but can’t seem to get past this.
>  
> Thanks and best regards,
> Jim
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] O-MPI Support for Windows 7

2012-02-06 Thread Ralph Castain
Afraid I'm no OpenSSH expert, but it sounds like there is an issue with its 
configuration. Check out the OpenSSH config options to see if something fits.

I did a quick search and found this, as an example:

http://mah.everybody.org/docs/ssh

Note the need to run ssh-agent to cache login credentials.

On Feb 6, 2012, at 5:05 PM, James Torossian wrote:

> Hi all,
> I am trying to setup Open-MPI across two Windows 7 machines with UAC disabled 
> ……
>  
> Cygwin with OpenSSH is installed, and I can successfully ssh to each machine 
> without entry of username and password:
>  
> JimT@JimT-PC ~
> $ ssh NanoOneQuad
> Last login: Tue Feb  7 01:42:02 2012 from jimt-pc
>  
> JimT@NanoOneQuad ~
> $
>  
> Regardless of this, mpirun insists on asking for a username and password; 
> then asks to save credentials, but if selected, responds with not 
> implemented. If saving credentials is not selected, then I can see that the 
> task starts on the other machine (in task manager) and that the task runs to 
> completion OK:
>  
> JimT@JimT-PC ~
> $ mpirun -H NanoOneQuad ipconfig.exe
> connecting to NanoOneQuad
> username:JimT
> password:**
> Save Credential?(Y/N) y
> [JimT-PC:03784] This feature hasn't been implemented yet.
>  
> JimT@JimT-PC ~
> $ mpirun -H NanoOneQuad ipconfig.exe
> connecting to NanoOneQuad
> username:JimT
> password:**
> Save Credential?(Y/N) n
>  
> JimT@JimT-PC ~
> $
>  
> Please let me know what I have missed. I have gone through the FAQs and have 
> rebuilt the windows version but can’t seem to get past this.
>  
> Thanks and best regards,
> Jim
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users