Re: [OMPI users] Network connection check

2009-07-23 Thread vipin kumar
>
> Are you asking to find out this information before issuing "mpirun"?  Open
> MPI does assume that the nodes you are trying to use are reachable.
>
>
 NO,

Scenario is a pair of processes are running one in "master" node say
"masterprocess" and one in "slave" node say "slaveprocess". When
"masterprocess" needs service of slave process, it sends message to
"slaveprocess" and "slaveprocess" serves its request. In case of Network
failure(by any means) "masterprocess" will keep trying to send message to
"slaveprocess" without knowing that it is not reachable. So how
"masterprocess" should finds out that "slaveprocess" can't be reached and
leave attempting to send messages till Connection is not up.


Thanks & Regards,
-- 
Vipin K.
Research Engineer,
C-DOTB, India


Re: [OMPI users] ifort and gfortran module

2009-07-23 Thread rahmani
Hi Martin
in your following solution I have a question:
in step2. move the Fortran module to the directory ...
what is "Fortran module"
in step3. we don't need to install openmpi?
thanks


- Original Message -
From: "Martin Siegert" 
To: "Open MPI Users" 
Sent: Monday, July 20, 2009 1:47:35 PM (GMT-0500) America/New_York
Subject: Re: [OMPI users] ifort and gfortran module

Hi,

I want to avoid separate MPI distributions since we compile many
MPI software packages. Having more than one MPI distribution
(at least) doubles the amount of work.

For now I came up with the following solution:

1. compile openmpi using gfortran as the Fortran compiler
   and install it in /usr/local/openmpi
2. move the Fortran module to the directory
   /usr/local/openmpi/include/gfortran. In that directory
   create softlinks to the files in /usr/local/openmpi/include.
3. compile openmpi using ifort and install the Fortran module in
   /usr/local/openmpi/include.
4. in /usr/local/openmpi/bin create softlinks mpif90.ifort
   and mpif90.gfortran pointing to opal_wrapper. Remove the
   mpif90 softlink.
5. Move /usr/local/openmpi/share/openmpi/mpif90-wrapper-data.txt
   to /usr/local/openmpi/share/openmpi/mpif90.ifort-wrapper-data.txt.
   Copy the file to
   /usr/local/openmpi/share/openmpi/mpif90.gfortran-wrapper-data.txt
   and change the line includedir=${includedir} to
   includedir=${includedir}/gfortran
6. Create a wrapper script /usr/local/openmpi/bin/mpif90:

#!/bin/bash
OMPI_WRAPPER_FC=`basename $OMPI_FC 2> /dev/null`
if [ "$OMPI_WRAPPER_FC" = 'gfortran' ]; then
   exec $0.gfortran "$@"
else
   exec $0.ifort "$@"
fi




Re: [OMPI users] Network connection check

2009-07-23 Thread Ralph Castain
It depends on which network fails. If you lose all TCP connectivity,  
Open MPI should abort the job as the out-of-band system will detect  
the loss of connection. If you only lose the MPI connection (whether  
TCP or some other interconnect), then I believe the system will  
eventually generate an error after it retries sending the message a  
specified number of times, though it may not abort.



On Jul 22, 2009, at 10:55 PM, vipin kumar wrote:

Are you asking to find out this information before issuing  
"mpirun"?  Open MPI does assume that the nodes you are trying to use  
are reachable.



 NO,

Scenario is a pair of processes are running one in "master" node say  
"masterprocess" and one in "slave" node say "slaveprocess". When  
"masterprocess" needs service of slave process, it sends message to  
"slaveprocess" and "slaveprocess" serves its request. In case of  
Network failure(by any means) "masterprocess" will keep trying to  
send message to "slaveprocess" without knowing that it is not  
reachable. So how "masterprocess" should finds out that  
"slaveprocess" can't be reached and leave attempting to send  
messages till Connection is not up.



Thanks & Regards,
--
Vipin K.
Research Engineer,
C-DOTB, India
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] [Open MPI Announce] Open MPI v1.3.3 released

2009-07-23 Thread Dave Love
Jeff Squyres  writes:

> The MPI ABI has not changed since 1.3.2.

Good, thanks.  I hadn't had time to investigate the items in the release
notes that looked suspicious.  Are there actually any known ABI
incompatibilities between 1.3.0 and 1.3.2?  We haven't noticed any as
far as I know.

> Note that our internal API's are *not* guaranteed to be ABI compatible
> between releases

Sure.

Thanks for clarifying.  I assumed there was a missing negative in the
previous answer about it, but it's worth spelling out.


Re: [OMPI users] ifort and gfortran module

2009-07-23 Thread Dave Love
Jeff Squyres  writes:

> See https://svn.open-mpi.org/source/xref/ompi_1.3/README#257.

Ah, neat.  I'd never thought of that, possibly due to ELF not being
relevant when I first started worrying about that sort of thing.

> Indeed.  In OMPI, we tried to make this as simple as possible.  But
> unless you use specific compiler options to hide their differences, it
> isn't possible and is beyond our purview to fix.  :-(

Sure.  It was a question of whether it's just the interface, in which
case flags may help with Fortran.

> (similar situation with the C++ bindings)

I'd have expected it to be worse, since compilers intentionally have
inconsistent name-mangling as I understand it, but I'm not clever enough
to understand C++ anyway :-/.


Re: [OMPI users] ifort and gfortran module

2009-07-23 Thread Dave Love
Jeff Squyres  writes:

> I *think* that there are compiler flags that you can use with ifort to
> make it behave similarly to gfortran in terms of sizes and constant
> values, etc.

At a slight tangent, if there are flags that might be helpful to add to
gfortran for compatibility (e.g. logical constants), I might be able to
do it, though I've not been involved since g77 and haven't had much
truck with such interface issues for a while.  Does anyone know of any
relevant incompatibilities that aren't covered by items in the README?


Re: [OMPI users] Network connection check

2009-07-23 Thread vipin kumar
On Thu, Jul 23, 2009 at 3:03 PM, Ralph Castain  wrote:

> It depends on which network fails. If you lose all TCP connectivity, Open
> MPI should abort the job as the out-of-band system will detect the loss of
> connection. If you only lose the MPI connection (whether TCP or some other
> interconnect), then I believe the system will eventually generate an error
> after it retries sending the message a specified number of times, though it
> may not abort.
>
>
Thank you Ralph,

>From your reply I came to know that the question I posted earlier was not
reflecting the problem properly.

I can't use blocking communication routines in my main program (
"masterprocess") because any type of network failure( may be due to physical
connectivity or TCP connectivity or MPI connection as you told) may occur.
So I am using non blocking point to point communication routines, and TEST
later for completion of that Request. Once I enter a TEST loop I will test
for Request complition till TIMEOUT. Suppose TIMEOUT has occured, In this
case first I will check whether

 1:  Slave machine is reachable or not,  (How I will do that ??? Given - I
have IP address and Host Name of Slave machine.)

 2:  if reachable, check whether program(orted and "slaveprocess") is alive
or not.

I don't want to abort my master process in case 1 and hope that network
connection will come up in future. Fortunately OpenMPI doesn't abort any
process. Both processes can run independently without communicating.


Thanks and Regards,
-- 
Vipin K.
Research Engineer,
C-DOTB, India


Re: [OMPI users] [Open MPI Announce] Open MPI v1.3.3 released

2009-07-23 Thread Jeff Squyres

On Jul 23, 2009, at 6:39 AM, Dave Love wrote:


> The MPI ABI has not changed since 1.3.2.

Good, thanks.  I hadn't had time to investigate the items in the  
release

notes that looked suspicious.  Are there actually any known ABI
incompatibilities between 1.3.0 and 1.3.2?  We haven't noticed any as
far as I know.



It *might* work?  To be honest, I would be surprised, though -- it may  
fail in subtle, non-obvious ways (i.e., during execution, not startup/ 
linking).  We made some changes in 1.3.2 in order to freeze the ABI  
for the future that *probably* have disruptive effects in seamlessly  
working with prior versions (there were some strange technical issues  
involving OMPI's use of pointers for MPI handles -- I can explain more  
if you care).


FWIW: the changes we made were in the back-end/internals of libmpi;  
source-code compatibility has been maintained since MPI-1.0 (aside  
from a handful of bugs in the MPI API that we have fixed over time --  
e.g., a wrong parameter type in an MPI API function, etc.).


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] ifort and gfortran module

2009-07-23 Thread Jeff Squyres
FWIW, for the Fortran MPI programmers out there, the MPI Forum is hard  
at work on a new Fortran 03 set of bindings for MPI-3.  We have a  
prototype in a side branch of Open MPI that is "mostly" working.  We  
(the MPI Forum) expect to release a short document describing the new  
features and the prototype Open MPI implementation for larger Fortran  
community comment within a few months.



On Jul 23, 2009, at 7:03 AM, Dave Love wrote:


Jeff Squyres  writes:

> See https://svn.open-mpi.org/source/xref/ompi_1.3/README#257.

Ah, neat.  I'd never thought of that, possibly due to ELF not being
relevant when I first started worrying about that sort of thing.

> Indeed.  In OMPI, we tried to make this as simple as possible.  But
> unless you use specific compiler options to hide their  
differences, it

> isn't possible and is beyond our purview to fix.  :-(

Sure.  It was a question of whether it's just the interface, in which
case flags may help with Fortran.

> (similar situation with the C++ bindings)

I'd have expected it to be worse, since compilers intentionally have
inconsistent name-mangling as I understand it, but I'm not clever  
enough

to understand C++ anyway :-/.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] Network connection check

2009-07-23 Thread Jeff Squyres

On Jul 23, 2009, at 7:36 AM, vipin kumar wrote:

I can't use blocking communication routines in my main program  
( "masterprocess") because any type of network failure( may be due  
to physical connectivity or TCP connectivity or MPI connection as  
you told) may occur. So I am using non blocking point to point  
communication routines, and TEST later for completion of that  
Request. Once I enter a TEST loop I will test for Request complition  
till TIMEOUT. Suppose TIMEOUT has occured, In this case first I will  
check whether


Open MPI should return a failure if TCP connectivity is lost, even  
with a non-blocking point-to-point operation.  The failure should be  
returned in the call to MPI_TEST (and friends).  So I'm not sure your  
timeout has meaning here -- if you reach the timeout, I think it  
simply means that the MPI communication has not completed yet.  It  
does not necessarily mean that the MPI communication has failed.


 1:  Slave machine is reachable or not,  (How I will do that ???  
Given - I have IP address and Host Name of Slave machine.)


 2:  if reachable, check whether program(orted and "slaveprocess")  
is alive or not.


MPI doesn't provide any standard way to check reachability and/or  
health of a peer process.


That being said, I think some of the academics are working on more  
fault tolerant / resilient MPI messaging, but I don't know if they're  
ready to talk about such efforts publicly yet.


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI users] Tuned collectives: How to choose them dynamically? (-mca coll_tuned_dynamic_rules_filename dyn_rules)"

2009-07-23 Thread Igor Kozin
Hi Gus,
I played with collectives a few months ago. Details are here
http://www.cse.scitech.ac.uk/disco/publications/WorkingNotes.ConnectX.pdf
That was in the context of 1.2.6

You can get available tuning options by doing
ompi_info -all -mca coll_tuned_use_dynamic_rules 1 | grep alltoall
and similarly for other collectives.

Best,
Igor

2009/7/23 Gus Correa :
> Dear OpenMPI experts
>
> I would like to experiment with the OpenMPI tuned collectives,
> hoping to improve the performance of some programs we run
> in production mode.
>
> However, I could not find any documentation on how to select the
> different collective algorithms and other parameters.
> In particular, I would love to read an explanation clarifying
> the syntax and meaning of the lines on "dyn_rules"
> file that is passed to
> "-mca coll_tuned_dynamic_rules_filename ./dyn_rules"
>
> Recently there was an interesting discussion on the list
> about this topic.  It showed that choosing the right collective
> algorithm can make a big difference in overall performance:
>
> http://www.open-mpi.org/community/lists/users/2009/05/9355.php
> http://www.open-mpi.org/community/lists/users/2009/05/9399.php
> http://www.open-mpi.org/community/lists/users/2009/05/9401.php
> http://www.open-mpi.org/community/lists/users/2009/05/9419.php
>
> However, the thread was concentrated on "MPI_Alltoall".
> Nothing was said about other collective functions.
> Not much was said about the
> "tuned collective dynamic rules" file syntax,
> the meaning of its parameters, etc.
>
> Is there any source of information about that which I missed?
> Thank you for any pointers or clarifications.
>
> Gus Correa
> -
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> -
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Warning: declaration ‘struct MPI::Grequest_intercept_t’ does not declare anything

2009-07-23 Thread Jeff Squyres

On Jul 22, 2009, at 3:17 AM, Alexey Sokolov wrote:

from /home/user/NetBeansProjects/Correlation_orig/ 
Correlation/Correlation.cpp:2:
/usr/include/openmpi/1.2.4-gcc/openmpi/ompi/mpi/cxx/ 
request_inln.h:347: warning: declaration ‘struct  
MPI::Grequest_intercept_t’ does not declare anything





That's fairly odd, but if your program is not using the C++ bindings  
for MPI generalized requests, it won't matter.


But as Jody noted, updating to Open MPI v1.3.3 is a better bet,  
anyway.  Distro-default packages are great and convenient, but Open  
MPI releases at a faster pace than distros.  It's annoying, but  
sometimes necessary to upgrade (especially if you're starting new and  
have no legacy reasons to stick with older software).


FWIW: we slightly changed the routine that was issuing the warning to  
you in 1.3.3.


Also, be aware that the MPI Forum is likely to deprecate the C++  
bindings in MPI-2.2.  They won't go away in MPI-2.2, but they may well  
go away in MPI-3.  Open MPI (and others) will likely still include C++  
binding functionality for a long time (to keep legacy codes still  
running), but they will become relegated to a minor subsystem.


--
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] Network connection check

2009-07-23 Thread jody
Maybe you could make a system call to ping the other machine.
char sCommand[512];
// build the command string
sprintf(sCommand, "ping -c %d -q %s > /dev/null", numPings, sHostName);
// execute the command
int iResult =system(sCommand);

If the ping was successful, iResult will have the value 0.

Jody

On Thu, Jul 23, 2009 at 1:36 PM, vipin kumar wrote:
>
>
> On Thu, Jul 23, 2009 at 3:03 PM, Ralph Castain  wrote:
>>
>> It depends on which network fails. If you lose all TCP connectivity, Open
>> MPI should abort the job as the out-of-band system will detect the loss of
>> connection. If you only lose the MPI connection (whether TCP or some other
>> interconnect), then I believe the system will eventually generate an error
>> after it retries sending the message a specified number of times, though it
>> may not abort.
>
> Thank you Ralph,
>
> From your reply I came to know that the question I posted earlier was not
> reflecting the problem properly.
>
> I can't use blocking communication routines in my main program (
> "masterprocess") because any type of network failure( may be due to physical
> connectivity or TCP connectivity or MPI connection as you told) may occur.
> So I am using non blocking point to point communication routines, and TEST
> later for completion of that Request. Once I enter a TEST loop I will test
> for Request complition till TIMEOUT. Suppose TIMEOUT has occured, In this
> case first I will check whether
>
>  1:  Slave machine is reachable or not,  (How I will do that ??? Given - I
> have IP address and Host Name of Slave machine.)
>
>  2:  if reachable, check whether program(orted and "slaveprocess") is alive
> or not.
>
> I don't want to abort my master process in case 1 and hope that network
> connection will come up in future. Fortunately OpenMPI doesn't abort any
> process. Both processes can run independently without communicating.
>
>
> Thanks and Regards,
> --
> Vipin K.
> Research Engineer,
> C-DOTB, India
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Network connection check

2009-07-23 Thread Prentice Bisbal

Jeff Squyres wrote:
> On Jul 22, 2009, at 10:05 AM, vipin kumar wrote:
> 
>> Actually requirement is how a C/C++ program running in "master" node
>> should find out whether "slave" node is reachable (as we check this
>> using "ping" command) or not ? Because IP address may change at any
>> time, that's why I am trying to achieve this using "host name" of the
>> "slave" node. How this can be done?
> 
> 
> Are you asking to find out this information before issuing "mpirun"? 
> Open MPI does assume that the nodes you are trying to use are reachable.
> 


How about you start your MPI program from a shell script that does the
following:

1. Reads a text file containing the names of all the possible candidates
 for MPI nodes

2. Loops through the list of names from (1) and pings each machine to
see if it's alive. If the host is pingable, then write it's name to a
different text file which will be host as the machine file for the
mpirun command

3. Call mpirun using the machine file generated in (2).

--
Prentice


Re: [OMPI users] Network connection check

2009-07-23 Thread Bogdan Costescu

On Thu, 23 Jul 2009, vipin kumar wrote:


1:  Slave machine is reachable or not,  (How I will do that ??? Given - I
have IP address and Host Name of Slave machine.)

2:  if reachable, check whether program(orted and "slaveprocess") is alive
or not.


You don't specify and based on your description I infer that you are 
not using a batch/queueing system, but just a rsh/ssh based start-up 
mechanism. A batch/queueing system might be able to tell you whether a 
remote computer is still accessible.


I think that MPI is not the proper mechanism to achieve what you want. 
PVM or, maybe better, direct socket programming will probably serve 
you more.


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI users] Network connection check

2009-07-23 Thread vipin kumar
Thank you all Jeff, Jody, Prentice and Bogdan for your invaluable
clarification, solution and suggestion,

Open MPI should return a failure if TCP connectivity is lost, even with a
> non-blocking point-to-point operation.  The failure should be returned in
> the call to MPI_TEST (and friends).


even if MPI_TEST is a local operation?


>  So I'm not sure your timeout has meaning here -- if you reach the timeout,
> I think it simply means that the MPI communication has not completed yet.
>  It does not necessarily mean that the MPI communication has failed.
>

you are absolutely correct., but the job should be done before it expires.
that's the reason I am using TIMEOUT.

So the conclusion is :

>
>  MPI doesn't provide any standard way to check reachability and/or health
> of a peer process.


That's what I wanted to confirm. And to find out the solution, if any, or
any alternative.

So now I think, I should go for Jody's approach


>
> How about you start your MPI program from a shell script that does the
> following:
>
> 1. Reads a text file containing the names of all the possible candidates
>  for MPI nodes
>
> 2. Loops through the list of names from (1) and pings each machine to
> see if it's alive. If the host is pingable, then write it's name to a
> different text file which will be host as the machine file for the
> mpirun command
>


>
> 3. Call mpirun using the machine file generated in (2).
>

I am assuming processes have been launched successfully.



-- 
Vipin K.
Research Engineer,
C-DOTB, India


[OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3

2009-07-23 Thread Craig Tierney

I have built OpenMPI 1.3.3 without support for SGE.
I just want to launch jobs with loose integration right
now.

Here is how I configured it:

./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90 
--prefix=/opt/openmpi/1.3.3-pgi --without-sge
 --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1 
--with-io-romio-flags=--with-file-system=lustre 
--enable-orterun-prefix-by-default


I can start jobs from the commandline just fine.  When
I try to do the same thing inside an SGE job, I get
errors like the following:


error: executing task of job 5041155 failed:
--
A daemon (pid 13324) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
mpirun: clean termination accomplished


I am starting mpirun with the following options:

$OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \
-machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl

The options are to ensure I am using IB, that SGE is not used, and that
the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done 
correctly.


This worked with 1.2.7 (except setting the pls option as gridengine 
instead of sge), but I can't get it to work with 1.3.3.


Am I missing something obvious for getting jobs with loose integration
started?

Thanks,
Craig



Re: [OMPI users] Network connection check

2009-07-23 Thread Durga Choudhury
The 'system' command will fork a separate process to run. If I
remember correctly, forking within MPI can lead to undefined behavior.
Can someone in OpenMPI development team clarify?

What I don't understand is: why is your TCP network so unstable that
you are worried about reachability? For MPI to run, they should be
connected on a local switch with a high bandwidth interconnect and not
dispersed across the internet. Perhaps you should look at the
underlying cause of network instability. If your network is actually
stable, then your problem is only theoretical.

Also, keep in mind that TCP itself offers a keepalive mechanism. Three
parameters may be specified: the amount of inactivity after which the
first probe is sent, the number of unanswered probes after which the
connection is dropped and the interval between the probes. Typing
'sysctl -a' will print the entire IP MIB that has these names (I don't
remember them off the top of my head). However, you say that you
*don't* want to drop the connection, simply want to know about
connectivity. What you can do, without causing 'undefined' MPI
behaviour is to implement a similar mechanism in your MPI application.

Durga


On Thu, Jul 23, 2009 at 10:25 AM, vipin kumar wrote:
> Thank you all Jeff, Jody, Prentice and Bogdan for your invaluable
> clarification, solution and suggestion,
>
>> Open MPI should return a failure if TCP connectivity is lost, even with a
>> non-blocking point-to-point operation.  The failure should be returned in
>> the call to MPI_TEST (and friends).
>
> even if MPI_TEST is a local operation?
>
>>
>>  So I'm not sure your timeout has meaning here -- if you reach the
>> timeout, I think it simply means that the MPI communication has not
>> completed yet.  It does not necessarily mean that the MPI communication has
>> failed.
>
> you are absolutely correct., but the job should be done before it expires.
> that's the reason I am using TIMEOUT.
>
> So the conclusion is :
>>
>> MPI doesn't provide any standard way to check reachability and/or health
>> of a peer process.
>
> That's what I wanted to confirm. And to find out the solution, if any, or
> any alternative.
>
> So now I think, I should go for Jody's approach
>
>>
>> How about you start your MPI program from a shell script that does the
>> following:
>>
>> 1. Reads a text file containing the names of all the possible candidates
>>  for MPI nodes
>>
>> 2. Loops through the list of names from (1) and pings each machine to
>> see if it's alive. If the host is pingable, then write it's name to a
>> different text file which will be host as the machine file for the
>> mpirun command
>
>
>>
>> 3. Call mpirun using the machine file generated in (2).
>
> I am assuming processes have been launched successfully.
>
>
>
> --
> Vipin K.
> Research Engineer,
> C-DOTB, India
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3

2009-07-23 Thread Rolf Vandevaart

I think what you are looking for is this:

--mca plm_rsh_disable_qrsh 1

This means we will disable the use of qrsh and use rsh or ssh instead.

The --mca pls ^sge does not work anymore for two reasons.  First, the 
"pls" framework was renamed "plm".  Secondly, the gridgengine plm was 
folded into the rsh/ssh one.


A few more details at
http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge

Rolf

On 07/23/09 10:34, Craig Tierney wrote:

I have built OpenMPI 1.3.3 without support for SGE.
I just want to launch jobs with loose integration right
now.

Here is how I configured it:

./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90 
--prefix=/opt/openmpi/1.3.3-pgi --without-sge
 --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1 
--with-io-romio-flags=--with-file-system=lustre 
--enable-orterun-prefix-by-default


I can start jobs from the commandline just fine.  When
I try to do the same thing inside an SGE job, I get
errors like the following:


error: executing task of job 5041155 failed:
--
A daemon (pid 13324) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
mpirun: clean termination accomplished


I am starting mpirun with the following options:

$OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \
-machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl

The options are to ensure I am using IB, that SGE is not used, and that
the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done 
correctly.


This worked with 1.2.7 (except setting the pls option as gridengine 
instead of sge), but I can't get it to work with 1.3.3.


Am I missing something obvious for getting jobs with loose integration
started?

Thanks,
Craig

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--

=
rolf.vandeva...@sun.com
781-442-3043
=


Re: [OMPI users] Network connection check

2009-07-23 Thread vipin kumar
> You don't specify and based on your description I infer that you are not
> using a batch/queueing system, but just a rsh/ssh based start-up mechanism.


You are absolutely correct. I am using  rsh/ssh based start-up mechanism.

A batch/queueing system might be able to tell you whether a remote computer
> is still accessible.
>

Right now I don't have any Idea about batch/queuing system, I will explore
about that also. And I think you mean it before launching the jobs.

>
> I think that MPI is not the proper mechanism to achieve what you want. PVM
> or, maybe better, direct socket programming will probably serve you more.


I will think about these also.

I have already spent significant amount of time in LAM-MPI and OPEN-MPI and
due to lack of time I don't want to switch to another mechanism. Anyway Open
MPI is doing great for me, Atleast 80% what I want.



Thanks & Regards,
-- 
Vipin K.
Research Engineer,
C-DOTB, India


Re: [OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3

2009-07-23 Thread Craig Tierney
Rolf Vandevaart wrote:
> I think what you are looking for is this:
> 
> --mca plm_rsh_disable_qrsh 1
> 
> This means we will disable the use of qrsh and use rsh or ssh instead.
> 
> The --mca pls ^sge does not work anymore for two reasons.  First, the
> "pls" framework was renamed "plm".  Secondly, the gridgengine plm was
> folded into the rsh/ssh one.
> 

Rolf,

Thanks for the quick reply.  That solved the problem.

Craig


> A few more details at
> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
> 
> Rolf
> 
> On 07/23/09 10:34, Craig Tierney wrote:
>> I have built OpenMPI 1.3.3 without support for SGE.
>> I just want to launch jobs with loose integration right
>> now.
>>
>> Here is how I configured it:
>>
>> ./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90
>> --prefix=/opt/openmpi/1.3.3-pgi --without-sge
>>  --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1
>> --with-io-romio-flags=--with-file-system=lustre
>> --enable-orterun-prefix-by-default
>>
>> I can start jobs from the commandline just fine.  When
>> I try to do the same thing inside an SGE job, I get
>> errors like the following:
>>
>>
>> error: executing task of job 5041155 failed:
>> --
>>
>> A daemon (pid 13324) died unexpectedly with status 1 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>> the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --
>>
>> --
>>
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>>
>> mpirun: clean termination accomplished
>>
>>
>> I am starting mpirun with the following options:
>>
>> $OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \
>> -machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl
>>
>> The options are to ensure I am using IB, that SGE is not used, and that
>> the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done
>> correctly.
>>
>> This worked with 1.2.7 (except setting the pls option as gridengine
>> instead of sge), but I can't get it to work with 1.3.3.
>>
>> Am I missing something obvious for getting jobs with loose integration
>> started?
>>
>> Thanks,
>> Craig
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 


-- 
Craig Tierney (craig.tier...@noaa.gov)


Re: [OMPI users] Profiling performance by forcing transport choice.

2009-07-23 Thread Eugene Loh




Nifty Tom Mitchell wrote:

  On Thu, Jun 25, 2009 at 08:37:21PM -0400, Jeff Squyres wrote:
  
  
Subject: Re: [OMPI users] 50%performance reduction due to OpenMPI v 1.3.2forcing
	allMPI traffic over Ethernet instead of using Infiniband

  
  

While the previous thread on "performance reduction" went left, right,
forward and beyond the initial topic it tickled an idea for application
profiling or characterizing.

What if the various transports (btl) had knobs that permitted stepwise 
insertion of bandwidth limits and latency limits etc. so the application
might be characterized better?
  

I'm unclear what you're asking about.  Are you asking that a BTL would
limit the performance delivered to the application?  E.g., the
interconnect is capable of 1 Gbyte/sec, but you only deliver 100
Mbyte/sec (or whatever the user selects) to the app so the user can see
whether bandwidth is a sensitive parameter for the app?

If so, I have a few thoughts.

1)  The actual limitations of an MPI implementation may hard to model. 
E.g., the amount of handshaking between processes, synchronization
delays, etc.

2)  For the most part, you could (actually even should) try doing this
stuff much higher up than the BTLs.  E.g., how about developing a PMPI
layer that does what you're talking about.

3)  I think folks have tried this sort of thing in the past by
instrumenting the code and then "playing it back" or "simulating" with
other performance parameters.  E.g., "I run for X cycles, then I send a
N-byte message, then compute another Y cycles, then post a receive,
then ..." and then turn the knobs for latency, bandwidth, etc., to see
at what point any of these become sensitive parameters.  You might
see:  gosh, as long as latency is lower than about 30-70 usec, it
really isn't important.  Or, whatever.  Off hand, I think different
people have tried this approach and (without bothering to check my
notes to see if my memory is any good) I think Dimemmas (associated
with Paraver and CEPBA Barcelona) was one such tool.

  Most micro benchmarks are designed to measure various hardware characteristics
but it is moderately hard to know what an application depends on.

The value of this is that:
	*the application authors might learn something
	about their code that is hard to know at a well 
	abstracted API level.

	*the purchasing decision maker would have the ability 
	to access a well instrumented cluster and build a 
	weighted value equation to help structure the decision.

	*the hardware vendor can learn what is valuable when deciding
	what feature and function needs the most attention/ transistors.

i.e. it might be as valuable to benchmark "your code" on a single well
instrumented platform as it might be to benchmark all the hardware you 
can get "yer hands on".



  






[OMPI users] TCP btl misbehaves if btl_tcp_port_min_v4 is not set.

2009-07-23 Thread Eric Thibodeau
Hello all,

   (this _might_ be related to https://svn.open-mpi.org/trac/ompi/ticket/1505)

   I just compiled and installed 1.3.3 ins a CentOS 5 environment and we 
noticed the
processes would deadlock as soon as they would start using TCP communications. 
The
test program is one that has been running on other clusters for years with no
problems. Furthermore, using local cores doesn't deadlock the process whereas 
forcing
inter-node communications (-bynode scheduling), immediately causes the problem.

Symptoms:
- processes don't crash or die, the use 100% CPU in system space (as opposed to 
user space)
- stracing one of the processes will show it is freewheeling in a polling loop.
- executing with --mca btl_base_verbose 30 will show weird port assignments, 
either they
are wrong or should be interpreted as being an offset from the default
btl_tcp_port_min_v4 (1024).
- The error "mca_btl_tcp_endpoint_complete_connect] connect() to  
failed: No
route to host (113)" _may_ be seen. We noticed it only showed up if we had vmnet
interfaces up and running on certain nodes. Note that setting

 oob_tcp_listen_mode=listen_thread
 oob_tcp_if_include=eth0
 btl_tcp_if_include=eth0

was one of our first reaction to this to no avail.

Workaround we found:

While keeping the above mentioned MCA parameters, we added 
btl_tcp_port_min_v4=2000 due
to some firewall rules (which we had obviously disabled as part of the trouble 
shooting
process) and noticed everything seemed to start working correctly from here on.

This seems to work but I can find no logical explanation as the code seems to 
be clean
in that respect.

Some pasting for people searching frantically for a solution:

[cluster-srv2:20379] btl: tcp: attempting to connect() to address 10.194.32.113 
on port
2052
[cluster-srv2:20381] btl: tcp: attempting to connect() to address 10.194.32.113 
on port
3076
[cluster-srv2:20377] btl: tcp: attempting to connect() to address 10.194.32.113 
on port 260
[cluster-srv2:20383] btl: tcp: attempting to connect() to address 10.194.32.113 
on port
3588
[cluster-srv1:19900] btl: tcp: attempting to connect() to address 10.194.32.117 
on port
1540
[cluster-srv2:20377] btl: tcp: attempting to connect() to address 10.194.32.117 
on port
2052
[cluster-srv2:20383] btl: tcp: attempting to connect() to address 10.194.32.117 
on port
3076
[cluster-srv1:19894] btl: tcp: attempting to connect() to address 10.194.32.117 
on port 516
[cluster-srv2:20379] btl: tcp: attempting to connect() to address 10.194.32.117 
on port
3588
[cluster-srv1:19898] btl: tcp: attempting to connect() to address 10.194.32.117 
on port
1028
[cluster-srv2:20381] btl: tcp: attempting to connect() to address 10.194.32.117 
on port
2564
[cluster-srv1:19896] btl: tcp: attempting to connect() to address 10.194.32.117 
on port 4
[cluster-srv3:13665] btl: tcp: attempting to connect() to address 10.194.32.115 
on port
1028
[cluster-srv3:13663] btl: tcp: attempting to connect() to address 10.194.32.115 
on port 4
[cluster-srv2][[44096,1],9][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
[cluster-srv2][[44096,1],13][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 10.194.32.117 failed: No route to host (113)
connect() to 10.194.32.117 failed: No route to host (113)
[cluster-srv3][[44096,1],20][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 10.194.32.115 failed: No route to host (113)

Cheers!

Eric Thiboedau



Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?

2009-07-23 Thread Song, Kai Song
Hi Ralph,

Thanks for the fast reply! I put the --display-allocation and --display-map 
flags on and it looks like the nodes allocation is just fine, but the job still 
hang. 

The output looks like this:
 /home/kaisong/test
node0001
node0001
node
node
Starting parallel job

==   ALLOCATED NODES   ==

 Data for node: Name: node0001  Num slots: 2Max slots: 0
 Data for node: Name: node  Num slots: 2Max slots: 0

=

    JOB MAP   

 Data for node: Name: node0001  Num procs: 2
Process OMPI jobid: [16591,1] Process rank: 0
Process OMPI jobid: [16591,1] Process rank: 1

 Data for node: Name: node  Num procs: 2
Process OMPI jobid: [16591,1] Process rank: 2
Process OMPI jobid: [16591,1] Process rank: 3

 =
(no hello wrold output, job just hang here until timeout).
And similar thing in the error output:
node - daemon did not report back when launched


Then, I ran the job manually by adding "-mca btl gm" flag for mpirun:
/home/software/ompi/1.3.2-pgi/bin/mpirun -mca gm --display-allocation 
--display-map -v -machinefile ./node -np 4 ./hello-hostname

MPI crashed with the following output/error:
==   ALLOCATED NODES   ==

  Data for node: Name: hbar.lbl.gov  Num slots: 0Max slots: 0
  Data for node: Name: node0045  Num slots: 4Max slots: 0
  Data for node: Name: node0046  Num slots: 4Max slots: 0
  Data for node: Name: node0047  Num slots: 4Max slots: 0
  Data for node: Name: node0048  Num slots: 4Max slots: 0

=

     JOB MAP   

  Data for node: Name: node0045  Num procs: 4
 Process OMPI jobid: [62741,1] Process rank: 0
 Process OMPI jobid: [62741,1] Process rank: 1
 Process OMPI jobid: [62741,1] Process rank: 2
 Process OMPI jobid: [62741,1] Process rank: 3

  =
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

   Process 1 ([[62741,1],1]) is on host: node0045
   Process 2 ([[62741,1],1]) is on host: node0045
   BTLs attempted: gm

Your MPI job is now going to abort; sorry.
--
--
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or 
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

   PML add procs failed
   --> Returned "Unreachable" (-12) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[node0045:366] Abort before MPI_INIT completed successfully; not able 
to guarantee that all other process
!
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[node0045:367] Abort before MPI_INIT completed successfully; not able 
to guarantee that all other process
!
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[node0045:368] Abort before MPI_INIT completed successfully; not able 
to guarantee that all other process
!


*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[node0045:365] Abort before MPI_INIT completed successfully; not able 
to guarantee that all other process
!
--
mpirun has exited due to process rank 3 with PID 368 on
node node0045 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
[hbar.lbl.gov:07770] 3 more processes have

Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?

2009-07-23 Thread Ralph Castain
My apologies - I had missed that -mca btl flag. That is the source of  
the trouble. IIRC, GM doesn't have a loopback method in it. OMPI  
requires that -every- proc be able to reach -every- proc, including  
itself.


So you must include the "self" btl at a minimum. Also, if you want  
more performance, you probably want to include the shared memory BTL  
as well.


So the recommended param would be:

-mca btl gm,sm,self

Order doesn't matter. I'm disturbed that it would hang when you run in  
batch, though, instead of abort. Try with this new flag and see if it  
runs in both batch and interactive mode.


HTH
Ralph

On Jul 23, 2009, at 1:10 PM, Song, Kai Song wrote:


Hi Ralph,

Thanks for the fast reply! I put the --display-allocation and -- 
display-map flags on and it looks like the nodes allocation is just  
fine, but the job still hang.


The output looks like this:
/home/kaisong/test
node0001
node0001
node
node
Starting parallel job

==   ALLOCATED NODES   ==

Data for node: Name: node0001   Num slots: 2Max slots: 0
Data for node: Name: node   Num slots: 2Max slots: 0

=

   JOB MAP   

Data for node: Name: node0001   Num procs: 2
Process OMPI jobid: [16591,1] Process rank: 0
Process OMPI jobid: [16591,1] Process rank: 1

Data for node: Name: node   Num procs: 2
Process OMPI jobid: [16591,1] Process rank: 2
Process OMPI jobid: [16591,1] Process rank: 3

=
(no hello wrold output, job just hang here until timeout).
And similar thing in the error output:
node - daemon did not report back when launched


Then, I ran the job manually by adding "-mca btl gm" flag for mpirun:
/home/software/ompi/1.3.2-pgi/bin/mpirun -mca gm --display- 
allocation --display-map -v -machinefile ./node -np 4 ./hello-hostname


MPI crashed with the following output/error:
==   ALLOCATED NODES   ==

 Data for node: Name: hbar.lbl.gov  Num slots: 0Max slots: 0
 Data for node: Name: node0045  Num slots: 4Max slots: 0
 Data for node: Name: node0046  Num slots: 4Max slots: 0
 Data for node: Name: node0047  Num slots: 4Max slots: 0
 Data for node: Name: node0048  Num slots: 4Max slots: 0

=

    JOB MAP   

 Data for node: Name: node0045  Num procs: 4
Process OMPI jobid: [62741,1] Process rank: 0
Process OMPI jobid: [62741,1] Process rank: 1
Process OMPI jobid: [62741,1] Process rank: 2
Process OMPI jobid: [62741,1] Process rank: 3

 =
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[62741,1],1]) is on host: node0045
  Process 2 ([[62741,1],1]) is on host: node0045
  BTLs attempted: gm

Your MPI job is now going to abort; sorry.
--
--
--
--
It looks like MPI_INIT failed for some reason; your parallel process  
is

likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[node0045:366] Abort before MPI_INIT completed successfully; not able
to guarantee that all other process
!
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[node0045:367] Abort before MPI_INIT completed successfully; not able
to guarantee that all other process
!
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[node0045:368] Abort before MPI_INIT completed successfully; not able
to guarantee that all ot

Re: [OMPI users] Receiving an unknown number of messages

2009-07-23 Thread Shaun Jackman

Eugene Loh wrote:

Shaun Jackman wrote:

For my MPI application, each process reads a file and for each line 
sends a message (MPI_Send) to one of the other processes determined by 
the contents of that line. Each process posts a single MPI_Irecv and 
uses MPI_Request_get_status to test for a received message. If a 
message has been received, it processes the message and posts a new 
MPI_Irecv. I believe this situation is not safe and prone to deadlock 
since MPI_Send may block. The receiver would need to post as many 
MPI_Irecv as messages it expects to receive, but it does not know in 
advance how many messages to expect from the other processes. How is 
this situation usually handled in an MPI appliation where the number 
of messages to receive is unknown?

...

Each process posts an MPI_Irecv to listen for in-coming messages.

Each process enters a loop in which it reads its file and sends out 
messages.  Within this loop, you also loop on MPI_Test to see if any 
message has arrived.  If so, process it, post another MPI_Irecv(), and 
keep polling.  (I'd use MPI_Test rather than MPI_Request_get_status 
since you'll have to call something like MPI_Test anyhow to complete the 
receive.)


Once you've posted all your sends, send out a special message to 
indicate you're finished.  I'm thinking of some sort of tree 
fan-in/fan-out barrier so that everyone will know when everyone is finished.


Keep polling on MPI_Test, processing further receives or advancing your 
fan-in/fan-out barrier.


So, the key ingredients are:

*) keep polling on MPI_Test and reposting MPI_Irecv calls to drain 
in-coming messages while you're still in your "send" phase
*) have another mechanism for processes to notify one another when 
they've finished their send phases


Hi Eugene,

Very astute. You've pretty much exactly described how it works now, 
particularly the loop around MPI_Test and MPI_Irecv to drain incoming 
messages. So, here's my worry, which I'll demonstrate with an example. 
We have four processes. Each calls MPI_Irecv once. Each reads one line 
of its file. Each sends one message with MPI_Send to some other 
process based on the line that it has read, and then goes into the 
MPI_Test/MPI_Irecv loop.


The events fall out in this order
2 sends to 0 and does not block (0 has one MPI_Irecv posted)
3 sends to 1 and does not block (1 has one MPI_Irecv posted)
0 receives the message from 2, consuming its MPI_Irecv
1 receives the message from 3, consuming its MPI_Irecv
0 sends to 1 and blocks (1 has no more MPI_Irecv posted)
1 sends to 0 and blocks (0 has no more MPI_Irecv posted)
and now processes 0 and 1 are deadlocked.

When I say `receives' above, I mean that Open MPI has received the 
message and copied it into the buffer passed to the MPI_Irecv call, 
but the application hasn't yet called MPI_Test. The next step would be 
for all the processes to call MPI_Test, but 0 and 1 are already 
deadlocked.


Cheers,
Shaun


[OMPI users] Open MPI:Problem with 64-bit openMPI and intel compiler

2009-07-23 Thread Sims, James S. Dr.
I have an OpenMPI  program compiled with a version of OpenMPI built using the 
ifort 10.1
compiler. I can compile and run this code with no problem, using the 32 bit
version of ifort. And I can also submit batch jobs using torque with this 
32-bit code.
However, compiling the same code to produce a 64 bit executable produces a code
that runs correctly only in the simplest cases. It does not run correctly when 
run
under the torque batch queuing system, running for awhile and then giving a 
segmentation violation in s section of code that is fine in the 32 bit version.

I have to run the mpi multinode jobs using our torque batch queuing system,
but we do have the capability of running the jobs in an interactive batch 
environment.

If I do a qsub -I -l nodes=1:x4gb
I get an interactive session on the remote node assigned to my job. I can run 
the
job using either 
./MPI_li_64 or
mpirun -np 1 ./MPI_li_64
and the job runs successfully to completion. I can also
start an interactive shell using
qsub -I -l nodes=1:ppn=2:x4gb
and I will get a single dual processor (or greater node). On this single node,
mpirun -np 2 ./MPI_li_64 works.
However, if instead I ask for two nodes in my interactive batch node,
qsub -I -l nodes=2:x4gb,
Two nodes will be assigned to me but when I enter
mpirun -np 2 ./MPI_li_64
the job runs awhile, then fails with a 
mpirun noticed that process rank 1 with PID 23104 on node n339 exited on signal 
11 (Segmentation fault).

I can trace this in the intel debugger and see that the segmentation fault is 
occuring in what should
be good code, and in code that executes with no problem when everything is 
compiled 32-bit. I am
at a loss for what could be preventing this code to run within the batch 
queuing environment in the
64-bit version.

Jim


[OMPI users] Interaction of MPI_Send and MPI_Barrier

2009-07-23 Thread Shaun Jackman

Hi,

Two processes run the following program:

request = MPI_Irecv
MPI_Send (to the other process)
MPI_Barrier
flag = MPI_Test(request)

Without the barrier, there's a race and MPI_Test may or may not return 
true, indicating whether the message has been received. With the 
barrier, is it guaranteed that the message will have been received, 
and MPI_Test will return true?


Cheers,
Shaun


Re: [OMPI users] Receiving an unknown number of messages

2009-07-23 Thread Eugene Loh

Shaun Jackman wrote:


Eugene Loh wrote:


Shaun Jackman wrote:

For my MPI application, each process reads a file and for each line 
sends a message (MPI_Send) to one of the other processes determined 
by the contents of that line. Each process posts a single MPI_Irecv 
and uses MPI_Request_get_status to test for a received message. If a 
message has been received, it processes the message and posts a new 
MPI_Irecv. I believe this situation is not safe and prone to 
deadlock since MPI_Send may block. The receiver would need to post 
as many MPI_Irecv as messages it expects to receive, but it does not 
know in advance how many messages to expect from the other 
processes. How is this situation usually handled in an MPI 
appliation where the number of messages to receive is unknown?



...


Each process posts an MPI_Irecv to listen for in-coming messages.

Each process enters a loop in which it reads its file and sends out 
messages.  Within this loop, you also loop on MPI_Test to see if any 
message has arrived.  If so, process it, post another MPI_Irecv(), 
and keep polling.  (I'd use MPI_Test rather than 
MPI_Request_get_status since you'll have to call something like 
MPI_Test anyhow to complete the receive.)


Once you've posted all your sends, send out a special message to 
indicate you're finished.  I'm thinking of some sort of tree 
fan-in/fan-out barrier so that everyone will know when everyone is 
finished.


Keep polling on MPI_Test, processing further receives or advancing 
your fan-in/fan-out barrier.


So, the key ingredients are:

*) keep polling on MPI_Test and reposting MPI_Irecv calls to drain 
in-coming messages while you're still in your "send" phase
*) have another mechanism for processes to notify one another when 
they've finished their send phases


Hi Eugene,

Very astute. You've pretty much exactly described how it works now, 
particularly the loop around MPI_Test and MPI_Irecv to drain incoming 
messages. So, here's my worry, which I'll demonstrate with an example. 
We have four processes. Each calls MPI_Irecv once. Each reads one line 
of its file. Each sends one message with MPI_Send to some other 
process based on the line that it has read, and then goes into the 
MPI_Test/MPI_Irecv loop.


The events fall out in this order
2 sends to 0 and does not block (0 has one MPI_Irecv posted)
3 sends to 1 and does not block (1 has one MPI_Irecv posted)
0 receives the message from 2, consuming its MPI_Irecv
1 receives the message from 3, consuming its MPI_Irecv
0 sends to 1 and blocks (1 has no more MPI_Irecv posted)
1 sends to 0 and blocks (0 has no more MPI_Irecv posted)
and now processes 0 and 1 are deadlocked.

When I say `receives' above, I mean that Open MPI has received the 
message and copied it into the buffer passed to the MPI_Irecv call, 
but the application hasn't yet called MPI_Test. The next step would be 
for all the processes to call MPI_Test, but 0 and 1 are already 
deadlocked.


I don't get it.  Processes should drain aggressively.  So, if 0 receives 
a message, it should immediately post the next MPI_Irecv.  Before 0 
posts a send, it should MPI_Test (and post the next MPI_Irecv if the 
test received a message).


Further, you could convert to MPI_Isend.

But maybe I'm missing something.


Re: [OMPI users] Interaction of MPI_Send and MPI_Barrier

2009-07-23 Thread Richard Treumann

No - it is not guaranteed. (it is highly probable though)

The return from the MPI_Send only guarantees that the data is safely held
somewhere other than the send buffer so you are free to modify the send
buffer. The MPI standard does not say where the data is to be held. It only
says that once the MPI_Test is successful, the data will have been
delivered to the receive buffer.

Consider this possible scenario:

MPI_Send is for a small message:
The data is sent toward the destination
To allow the MPI_Send to complete promptly ,lib MPI makes a temporary copy
of the message
The MPI_Send returns once the copy is made
the message gets lost in the network
the MPI_Barrier does its communication without packet loss and completes
the call to MPI_Test returns false
the send side gets no ack for the lost message and eventually retransmits
it from the saved temp
This time it gets through
A later MPI_Test succeeds
An ack eventually gets back to the sender and it throws away the temp copy
of the message it was keeping in case a retransmit was needed

I am not saying any particular MPI library would work this way but it is
one kind of thing that a libmpi might do to give better performance while
maintaining the strict rules of MPI semantic.  Since the MPI_Barrier does
not make any guarantee about the destination status of sends done before
it, this kind of optimization is legitimate.

If you must know that the message is received once the barrier returns, you
need to MPI_Wait the message before you call barrier.

Dick


Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


users-boun...@open-mpi.org wrote on 07/23/2009 05:02:51 PM:

> [image removed]
>
> [OMPI users] Interaction of MPI_Send and MPI_Barrier
>
> Shaun Jackman
>
> to:
>
> Open MPI
>
> 07/23/2009 05:04 PM
>
> Sent by:
>
> users-boun...@open-mpi.org
>
> Please respond to Open MPI Users
>
> Hi,
>
> Two processes run the following program:
>
> request = MPI_Irecv
> MPI_Send (to the other process)
> MPI_Barrier
> flag = MPI_Test(request)
>
> Without the barrier, there's a race and MPI_Test may or may not return
> true, indicating whether the message has been received. With the
> barrier, is it guaranteed that the message will have been received,
> and MPI_Test will return true?
>
> Cheers,
> Shaun
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] Backwards compatibility?

2009-07-23 Thread David Doria
Is OpenMPI backwards compatible? I.e. If I am running 1.3.1 on one
machine and 1.3.3 on the rest, is it supposed to work? Or do they all
need exactly the same version?

When I add this wrong version machine to the machinelist, with a
simple "hello world from each process type program", I see no output
what soever, even with the verbose flag - it just sits there
indefinitely.

Thanks,

David


Re: [OMPI users] Open MPI:Problem with 64-bit openMPI and intel compiler

2009-07-23 Thread Ralph Castain

What OMPI version are you using?

On Jul 23, 2009, at 3:00 PM, Sims, James S. Dr. wrote:

I have an OpenMPI  program compiled with a version of OpenMPI built  
using the ifort 10.1
compiler. I can compile and run this code with no problem, using the  
32 bit
version of ifort. And I can also submit batch jobs using torque with  
this 32-bit code.
However, compiling the same code to produce a 64 bit executable  
produces a code
that runs correctly only in the simplest cases. It does not run  
correctly when run
under the torque batch queuing system, running for awhile and then  
giving a
segmentation violation in s section of code that is fine in the 32  
bit version.


I have to run the mpi multinode jobs using our torque batch queuing  
system,
but we do have the capability of running the jobs in an interactive  
batch environment.


If I do a qsub -I -l nodes=1:x4gb
I get an interactive session on the remote node assigned to my job.  
I can run the

job using either
./MPI_li_64 or
mpirun -np 1 ./MPI_li_64
and the job runs successfully to completion. I can also
start an interactive shell using
qsub -I -l nodes=1:ppn=2:x4gb
and I will get a single dual processor (or greater node). On this  
single node,

mpirun -np 2 ./MPI_li_64 works.
However, if instead I ask for two nodes in my interactive batch node,
qsub -I -l nodes=2:x4gb,
Two nodes will be assigned to me but when I enter
mpirun -np 2 ./MPI_li_64
the job runs awhile, then fails with a
mpirun noticed that process rank 1 with PID 23104 on node n339  
exited on signal 11 (Segmentation fault).


I can trace this in the intel debugger and see that the segmentation  
fault is occuring in what should
be good code, and in code that executes with no problem when  
everything is compiled 32-bit. I am
at a loss for what could be preventing this code to run within the  
batch queuing environment in the

64-bit version.

Jim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Backwards compatibility?

2009-07-23 Thread Ralph Castain
I doubt those two would work together - however, a combination of  
1.3.2 and 1.3.3 should.


You might look at the ABI compatibility discussion threads (there have  
been several) on this list for the reasons. Basically, binary  
compatibility is supported starting with 1.3.2 and above.


On Jul 23, 2009, at 3:28 PM, David Doria wrote:


Is OpenMPI backwards compatible? I.e. If I am running 1.3.1 on one
machine and 1.3.3 on the rest, is it supposed to work? Or do they all
need exactly the same version?

When I add this wrong version machine to the machinelist, with a
simple "hello world from each process type program", I see no output
what soever, even with the verbose flag - it just sits there
indefinitely.

Thanks,

David
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?

2009-07-23 Thread Song, Kai Song
Hi Ralph,

With the flag -mca btl gm,sm,self, runnig the job maually works and has a 
better performance as you said!

However, it still gangs there when it goes through the PBS scheduler. 

Here is my PBS script:
#!/bin/sh
#PBS -l nodes=2:ppn=2
#PBS -l walltime=00:02:00
#PBS -k eo

cd ~kaisong/test
echo `pwd`
cat $PBS_NODEFILE
echo "Starting parallel job"
/home/software/ompi/1.3.2-pgi/bin/mpirun -mca btl gm,self --display-allocation 
--display-map -d 8 -v -machinefile $PBS_NODEFILE -np 4 ./hello-hostname
echo "ending parallel job"

The error message and ouput file from torque are same as before. What other 
problems do you think it could be...? Please let me know if you need more 
information about our system.

Thanks a lot for helping me along this far! I hope we are getting close to find 
out the real problem.

Kai

Kai Song
 1.510.486.4894
High Performance Computing Services (HPCS) Intern
Lawrence Berkeley National Laboratory - http://scs.lbl.gov


- Original Message -
From: Ralph Castain 
List-Post: users@lists.open-mpi.org
Date: Thursday, July 23, 2009 1:06 pm
Subject: Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?
To: "Song, Kai Song" 
Cc: Open MPI Users 

> My apologies - I had missed that -mca btl flag. That is the source 
> of  
> the trouble. IIRC, GM doesn't have a loopback method in it. OMPI  
> requires that -every- proc be able to reach -every- proc, including 
> 
> itself.
> 
> So you must include the "self" btl at a minimum. Also, if you want  
> more performance, you probably want to include the shared memory 
> BTL  
> as well.
> 
> So the recommended param would be:
> 
> -mca btl gm,sm,self
> 
> Order doesn't matter. I'm disturbed that it would hang when you run 
> in  
> batch, though, instead of abort. Try with this new flag and see if 
> it  
> runs in both batch and interactive mode.
> 
> HTH
> Ralph
> 
> On Jul 23, 2009, at 1:10 PM, Song, Kai Song wrote:
> 
> > Hi Ralph,
> >
> > Thanks for the fast reply! I put the --display-allocation and -- 
> > display-map flags on and it looks like the nodes allocation is 
> just  
> > fine, but the job still hang.
> >
> > The output looks like this:
> > /home/kaisong/test
> > node0001
> > node0001
> > node
> > node
> > Starting parallel job
> >
> > ==   ALLOCATED NODES   ==
> >
> > Data for node: Name: node0001   Num slots: 2Max slots: 0
> > Data for node: Name: node   Num slots: 2Max slots: 0
> >
> > =
> >
> >    JOB MAP   
> >
> > Data for node: Name: node0001   Num procs: 2
> > Process OMPI jobid: [16591,1] Process rank: 0
> > Process OMPI jobid: [16591,1] Process rank: 1
> >
> > Data for node: Name: node   Num procs: 2
> > Process OMPI jobid: [16591,1] Process rank: 2
> > Process OMPI jobid: [16591,1] Process rank: 3
> >
> > =
> > (no hello wrold output, job just hang here until timeout).
> > And similar thing in the error output:
> > node - daemon did not report back when launched
> >
> >
> > Then, I ran the job manually by adding "-mca btl gm" flag for 
> mpirun:> /home/software/ompi/1.3.2-pgi/bin/mpirun -mca gm --display-
> 
> > allocation --display-map -v -machinefile ./node -np 4 ./hello-
> hostname>
> > MPI crashed with the following output/error:
> > ==   ALLOCATED NODES   ==
> >
> >  Data for node: Name: hbar.lbl.gov  Num slots: 0Max 
> slots: 0
> >  Data for node: Name: node0045  Num slots: 4Max slots: 0
> >  Data for node: Name: node0046  Num slots: 4Max slots: 0
> >  Data for node: Name: node0047  Num slots: 4Max slots: 0
> >  Data for node: Name: node0048  Num slots: 4Max slots: 0
> >
> > =
> >
> >     JOB MAP   
> >
> >  Data for node: Name: node0045  Num procs: 4
> > Process OMPI jobid: [62741,1] Process rank: 0
> > Process OMPI jobid: [62741,1] Process rank: 1
> > Process OMPI jobid: [62741,1] Process rank: 2
> > Process OMPI jobid: [62741,1] Process rank: 3
> >
> >  =
> > --
> 
> > At least one pair of MPI processes are unable to reach each other 
> for> MPI communications.  This means that no Open MPI device has 
> indicated> that it can be used to communicate between these 
> processes.  This is
> > an error; Open MPI requires that all MPI processes be able to reach
> > each other.  This error can sometimes be the result of forgetting to
> > specify the "self" BTL.
> >
> >   Process 1 ([[62741,1],1]) is on host: node0045
> >   Process 2 ([[62741,1],1]) is on host: node0045
> >  

Re: [OMPI users] Backwards compatibility?

2009-07-23 Thread David Doria
On Thu, Jul 23, 2009 at 5:47 PM, Ralph Castain wrote:
> I doubt those two would work together - however, a combination of 1.3.2 and
> 1.3.3 should.
>
> You might look at the ABI compatibility discussion threads (there have been
> several) on this list for the reasons. Basically, binary compatibility is
> supported starting with 1.3.2 and above.

Ok - I'll make sure to use all the same version. Is there anyway that
can be detected and an error thrown? It took me quite a while to
figure out that one machine was the wrong version.

Thanks,

David


Re: [OMPI users] Open MPI:Problem with 64-bit openMPI and intel compiler

2009-07-23 Thread Sims, James S. Dr.
[sims@raritan openmpi]$ mpirun -V
mpirun (Open MPI) 1.3.1rc4


From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of 
Ralph Castain [r...@open-mpi.org]
Sent: Thursday, July 23, 2009 5:44 PM
To: Open MPI Users
Subject: Re: [OMPI users] Open MPI:Problem with 64-bit openMPI and intel
compiler

What OMPI version are you using?

On Jul 23, 2009, at 3:00 PM, Sims, James S. Dr. wrote:

> I have an OpenMPI  program compiled with a version of OpenMPI built
> using the ifort 10.1
> compiler. I can compile and run this code with no problem, using the
> 32 bit
> version of ifort. And I can also submit batch jobs using torque with
> this 32-bit code.
> However, compiling the same code to produce a 64 bit executable
> produces a code
> that runs correctly only in the simplest cases. It does not run
> correctly when run
> under the torque batch queuing system, running for awhile and then
> giving a
> segmentation violation in s section of code that is fine in the 32
> bit version.
>
> I have to run the mpi multinode jobs using our torque batch queuing
> system,
> but we do have the capability of running the jobs in an interactive
> batch environment.
>
> If I do a qsub -I -l nodes=1:x4gb
> I get an interactive session on the remote node assigned to my job.
> I can run the
> job using either
> ./MPI_li_64 or
> mpirun -np 1 ./MPI_li_64
> and the job runs successfully to completion. I can also
> start an interactive shell using
> qsub -I -l nodes=1:ppn=2:x4gb
> and I will get a single dual processor (or greater node). On this
> single node,
> mpirun -np 2 ./MPI_li_64 works.
> However, if instead I ask for two nodes in my interactive batch node,
> qsub -I -l nodes=2:x4gb,
> Two nodes will be assigned to me but when I enter
> mpirun -np 2 ./MPI_li_64
> the job runs awhile, then fails with a
> mpirun noticed that process rank 1 with PID 23104 on node n339
> exited on signal 11 (Segmentation fault).
>
> I can trace this in the intel debugger and see that the segmentation
> fault is occuring in what should
> be good code, and in code that executes with no problem when
> everything is compiled 32-bit. I am
> at a loss for what could be preventing this code to run within the
> batch queuing environment in the
> 64-bit version.
>
> Jim
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Open MPI:Problem with 64-bit openMPI and intel compiler

2009-07-23 Thread Ralph Castain

Okay - thanks!

First, be assured we run 64-bit ifort code under Torque at large scale  
all the time here at LANL, so this is likely to be something trivial  
in your environment.


A few things to consider/try:

1. most likely culprit is that your LD_LIBRARY_PATH is pointing to the  
32-bit libraries on the other nodes. Torque does -not- copy your  
environment by default, and neither does OMPI. Try adding


-x LD_LIBRARY_PATH

to your cmd line, making sure that the 64-bit libs are before any 32- 
bit libs in that envar. This tells mpirun to pickup that envar and  
propagate it for you.


2. check to ensure you are in fact using a 64-bit version of OMPI. Run  
"ompi_info --config" to see how it was built. Also run "mpif90 -- 
showme" and see what libs it is linked to. Does your LD_LIBRARY_PATH  
match the names and ordering?


3. get a multi-node allocation and run "pbsdsh echo $LD_LIBRARY_PATH"  
and see what libs you are defaulting to on the other nodes.


I realize these are somewhat overlapping, but I think you catch the  
drift - I suspect you are getting the infamous "library confusion"  
problem.


HTH
Ralph

On Jul 23, 2009, at 7:49 PM, Sims, James S. Dr. wrote:


[sims@raritan openmpi]$ mpirun -V
mpirun (Open MPI) 1.3.1rc4


From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On  
Behalf Of Ralph Castain [r...@open-mpi.org]

Sent: Thursday, July 23, 2009 5:44 PM
To: Open MPI Users
Subject: Re: [OMPI users] Open MPI:Problem with 64-bit openMPI and  
intelcompiler


What OMPI version are you using?

On Jul 23, 2009, at 3:00 PM, Sims, James S. Dr. wrote:


I have an OpenMPI  program compiled with a version of OpenMPI built
using the ifort 10.1
compiler. I can compile and run this code with no problem, using the
32 bit
version of ifort. And I can also submit batch jobs using torque with
this 32-bit code.
However, compiling the same code to produce a 64 bit executable
produces a code
that runs correctly only in the simplest cases. It does not run
correctly when run
under the torque batch queuing system, running for awhile and then
giving a
segmentation violation in s section of code that is fine in the 32
bit version.

I have to run the mpi multinode jobs using our torque batch queuing
system,
but we do have the capability of running the jobs in an interactive
batch environment.

If I do a qsub -I -l nodes=1:x4gb
I get an interactive session on the remote node assigned to my job.
I can run the
job using either
./MPI_li_64 or
mpirun -np 1 ./MPI_li_64
and the job runs successfully to completion. I can also
start an interactive shell using
qsub -I -l nodes=1:ppn=2:x4gb
and I will get a single dual processor (or greater node). On this
single node,
mpirun -np 2 ./MPI_li_64 works.
However, if instead I ask for two nodes in my interactive batch node,
qsub -I -l nodes=2:x4gb,
Two nodes will be assigned to me but when I enter
mpirun -np 2 ./MPI_li_64
the job runs awhile, then fails with a
mpirun noticed that process rank 1 with PID 23104 on node n339
exited on signal 11 (Segmentation fault).

I can trace this in the intel debugger and see that the segmentation
fault is occuring in what should
be good code, and in code that executes with no problem when
everything is compiled 32-bit. I am
at a loss for what could be preventing this code to run within the
batch queuing environment in the
64-bit version.

Jim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users