Re: [OMPI users] Fwd: Problems installing in Cygwin

2008-10-30 Thread Jeff Squyres

On Oct 29, 2008, at 4:31 PM, Gustavo Seabra wrote:


Ugh.  IMHO, Cygwin != POSIX.

The problem is that we're making the assumption that if dlsym() is  
present,
RTLD_NEXT is defined.  I guess that's not true for cygwin (lame).   
I suppose
that we could also check for RTLD_NEXT...?  Is there any other OS  
where

dlsym() is present by RTLD_NEXT is not?

Would it be easier to run Linux in a virtual machine on your  
windows host?

You'll probably get a lot better performance...?


Hi Jeff,

Are you sure RTLD_NEXT is part of the POSIX standard? I may be looking
in the wrong place, but apparently it is *not* part of the standard,
at least as defined here:

http://www.opengroup.org/onlinepubs/95399/basedefs/dlfcn.h.html


Fair enough -- my words were ambiguous, and probably overly broad.  I  
was trying to convey that my prior experience with Cygwin has biased  
me to believe that Cygwin tends to be "different" than other POSIX- 
like OSs, such as Linux, Solaris, and OS X.



It would seem that this is a GNU extension, so it becomes available
when __USE_GNU is defined. Now, looking at the cygwin version of
dlfcn.h, I see that RTDL_NEXT is *not* defined, but RTDL_LAZY,
RTLD_NOW, RTLD_LOCAL and RTLD_GLOBAL, which makes it compliant with
POSIX, but not GNU.

The 'memory_mallopt_component.c' only checks if 'HAVE_DLSYM' is
defined. If so, it defines __USE_GNU then includes dlfcn.h. This is
ok, assuming you have a GNU version of dlfcn.h, but apparently this is
not present in Cygwin...


Understood; this was a more complete/precise meaning for my question  
"Is there any other OS where
dlsym() is present by RTLD_NEXT is not?"  I suppose we can extend the  
configure test to check for RTLD_NEXT as well.  In this way, that  
component won't even decide to build itself.  You won't need this  
component, because it's only really useful for the OpenFabrics and  
[ancient] Myricom GM drivers in Open MPI, neither of which are likely  
to be supported in Cygwin.


Also FWIW, my understanding is that running another OS in a VM (such  
as Linux or your favorite BSD) will run *much* faster than Cygwin.  I  
have dim recollections of LAM's and OMPI's "configure" script taking  
loong periods of time (I no longer have easy access to a Windows  
machine to do such testing).  Those with more Windows experience than  
me attributed it to Windows' process model implementation, which is  
quite different than Linux/Solaris/OSX/etc.  So I'm just curious: do  
you have a reason for preferring Cygwin instead of a VM?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Fwd: Problems installing in Cygwin

2008-10-30 Thread George Bosilca

Gustavo,

As Jeff mentioned this component is not required on Windows. You can  
disable it completely in Open MPI and everything will continue to work  
correctly. Please add --enable-mca-no-build=memory_mallopt o maybe the  
more generic (as there is no need for any memory manager on Windows -- 
enable-mca-no-build=memory.


Just a word about performance. I think you already noticed how long  
the configure step is, and believe it is fast compared with building  
the whole Open MPI. However, once built, Open MPI (and most of Cygwin  
applications) only see their performance slightly affected by the fact  
they run on Cygwin. Even the network performances are correct.


It is possible to have a native version of Open MPI on Windows. There  
are two ways to achieve this. First, install SFU, and compile there.  
It worked last time I checked, but it's not the solution I prefer.  
Second, you can install the express version of the Microsoft Visual  
Studio (which is free), and set your PATH, LIB and INCLUDE correctly  
to point to the installation, and then you can use the cl compiler to  
build Open MPI even on Windows.


  george.


On Oct 30, 2008, at 8:40 AM, Jeff Squyres wrote:


On Oct 29, 2008, at 4:31 PM, Gustavo Seabra wrote:


Ugh.  IMHO, Cygwin != POSIX.

The problem is that we're making the assumption that if dlsym() is  
present,
RTLD_NEXT is defined.  I guess that's not true for cygwin (lame).   
I suppose
that we could also check for RTLD_NEXT...?  Is there any other OS  
where

dlsym() is present by RTLD_NEXT is not?

Would it be easier to run Linux in a virtual machine on your  
windows host?

You'll probably get a lot better performance...?


Hi Jeff,

Are you sure RTLD_NEXT is part of the POSIX standard? I may be  
looking

in the wrong place, but apparently it is *not* part of the standard,
at least as defined here:

http://www.opengroup.org/onlinepubs/95399/basedefs/dlfcn.h.html


Fair enough -- my words were ambiguous, and probably overly broad.   
I was trying to convey that my prior experience with Cygwin has  
biased me to believe that Cygwin tends to be "different" than other  
POSIX-like OSs, such as Linux, Solaris, and OS X.



It would seem that this is a GNU extension, so it becomes available
when __USE_GNU is defined. Now, looking at the cygwin version of
dlfcn.h, I see that RTDL_NEXT is *not* defined, but RTDL_LAZY,
RTLD_NOW, RTLD_LOCAL and RTLD_GLOBAL, which makes it compliant with
POSIX, but not GNU.

The 'memory_mallopt_component.c' only checks if 'HAVE_DLSYM' is
defined. If so, it defines __USE_GNU then includes dlfcn.h. This is
ok, assuming you have a GNU version of dlfcn.h, but apparently this  
is

not present in Cygwin...


Understood; this was a more complete/precise meaning for my question  
"Is there any other OS where
dlsym() is present by RTLD_NEXT is not?"  I suppose we can extend  
the configure test to check for RTLD_NEXT as well.  In this way,  
that component won't even decide to build itself.  You won't need  
this component, because it's only really useful for the OpenFabrics  
and [ancient] Myricom GM drivers in Open MPI, neither of which are  
likely to be supported in Cygwin.


Also FWIW, my understanding is that running another OS in a VM (such  
as Linux or your favorite BSD) will run *much* faster than Cygwin.   
I have dim recollections of LAM's and OMPI's "configure" script  
taking loong periods of time (I no longer have easy access to a  
Windows machine to do such testing).  Those with more Windows  
experience than me attributed it to Windows' process model  
implementation, which is quite different than Linux/Solaris/OSX/ 
etc.  So I'm just curious: do you have a reason for preferring  
Cygwin instead of a VM?


--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] Fwd: Problems installing in Cygwin

2008-10-30 Thread Gustavo Seabra
On Thu, Oct 30, 2008 at 8:40 AM, Jeff Squyres wrote:
> On Oct 29, 2008, at 4:31 PM, Gustavo Seabra wrote:
>
>>> Ugh.  IMHO, Cygwin != POSIX.
>>>
>>> The problem is that we're making the assumption that if dlsym() is
>>> present,
>>> RTLD_NEXT is defined.  I guess that's not true for cygwin (lame).  I
>>> suppose
>>> that we could also check for RTLD_NEXT...?  Is there any other OS where
>>> dlsym() is present by RTLD_NEXT is not?
>>>
>>> Would it be easier to run Linux in a virtual machine on your windows
>>> host?
>>> You'll probably get a lot better performance...?
>>
>> Hi Jeff,
>>
>> Are you sure RTLD_NEXT is part of the POSIX standard? I may be looking
>> in the wrong place, but apparently it is *not* part of the standard,
>> at least as defined here:
>>
>> http://www.opengroup.org/onlinepubs/95399/basedefs/dlfcn.h.html
>
> Fair enough -- my words were ambiguous, and probably overly broad.  I was
> trying to convey that my prior experience with Cygwin has biased me to
> believe that Cygwin tends to be "different" than other POSIX-like OSs, such
> as Linux, Solaris, and OS X.
>
>> It would seem that this is a GNU extension, so it becomes available
>> when __USE_GNU is defined. Now, looking at the cygwin version of
>> dlfcn.h, I see that RTDL_NEXT is *not* defined, but RTDL_LAZY,
>> RTLD_NOW, RTLD_LOCAL and RTLD_GLOBAL, which makes it compliant with
>> POSIX, but not GNU.
>>
>> The 'memory_mallopt_component.c' only checks if 'HAVE_DLSYM' is
>> defined. If so, it defines __USE_GNU then includes dlfcn.h. This is
>> ok, assuming you have a GNU version of dlfcn.h, but apparently this is
>> not present in Cygwin...
>
> Understood; this was a more complete/precise meaning for my question "Is
> there any other OS where
> dlsym() is present by RTLD_NEXT is not?"  I suppose we can extend the
> configure test to check for RTLD_NEXT as well.  In this way, that component
> won't even decide to build itself.  You won't need this component, because
> it's only really useful for the OpenFabrics and [ancient] Myricom GM drivers
> in Open MPI, neither of which are likely to be supported in Cygwin.


That should be good enough, at least for that part. Or testing first
for the presence of OpenFabrics or Myricom? Maybe it could just test
for the existence of GNU extensions? I don't know. I understand it
must be really hard to keep track of what is standard and what is not
these days. I'm just thankful that you guys are looking into it.
Thanks!

> Also FWIW, my understanding is that running another OS in a VM (such as
> Linux or your favorite BSD) will run *much* faster than Cygwin.  I have dim
> recollections of LAM's and OMPI's "configure" script taking loong
> periods of time (I no longer have easy access to a Windows machine to do
> such testing).  Those with more Windows experience than me attributed it to
> Windows' process model implementation, which is quite different than
> Linux/Solaris/OSX/etc.  So I'm just curious: do you have a reason for
> preferring Cygwin instead of a VM?

Well... I don't. It's just that, due to specifics of my work, I need
to work on a Windows computer, but I also like to use many unix
features / commands. So, I just use Cygwin out of convenience, which
in a way gives me the best of both worlds without the need to dual
boot.

However, the other reason I use Cygwin is because I work in the
development of a program and it is very convenient to do that in
Cygwin, especially when I'm traveling and only have access to my
laptop. Many users have this program running in Cygwin, so it's also
good to have a place to test it. I don't really use Cygwin for the
long "production" runs that would actually require a MPI, for that I
have access to local clusters or Teragrid. My problem is testing the
parallel version in Cygwin (or if any changes made break the parallel
implementation) because I still did not manage to install a MPI in
Cygwin.

In fact, I have never tried a VM :-$ I guess I should give it a try
sometime. Do you have any recommendations? My only requirements are
that (i) it works, (ii) it's free.

Thanks a lot!!
-- 
Gustavo Seabra
Postdoctoral Associate
Quantum Theory Project - University of Florida
Gainesville - Florida - USA


Re: [OMPI users] Mixed Threaded MPI code, how to launch?

2008-10-30 Thread Brock Palen

Any thoughts on this?

We are looking writing a script that parses $PBS_NODEFILE to create a  
machinefile and using -machinefile


When we do that though we have to disable tm to avoid an error (-mca  
pls ^tm) this is far from preferable.


Any ideas to tell mpirun to only launch on half the cpus given to it  
by PBS, but each cpu must have adjacent to it another cpu in the same  
node?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 25, 2008, at 5:36 PM, Brock Palen wrote:

We have a user with a code that uses threaded solvers inside each  
MPI rank.  They would like to run two threads per process.


The question is how to launch this?  The default -byslot puts all  
the processes on the first sets of cpus not leaving any cpus for  
the second thread for each process.  And half the cpus are wasted.


The -bynode option works in theory, if all our nodes had the same  
number of core (they do not).


So right now the user did:

#PBS -l nodes=22:ppn=2
export OMP_NUM_THREADS=2
mpirun -np 22 app

Which made me aware of the problem.

How can I basically tell OMPI that a 'slot'  is two cores on the  
same machine?This needs to work inside out torque based  
queueing system.


Sorry If I was not clear about my goal.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] Mixed Threaded MPI code, how to launch?

2008-10-30 Thread Ralph Castain
I believe I answered much of this the other day - did it get lost in  
the email?


As for using TM with a hostfile - this is an unfortunately bug in the  
1.2 series. You can't - you'll have to move to 1.3 to do so. When you  
do, note the changed handling of hostfiles as specified on the wiki:


https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan

Ralph


I take it this is using OMPI 1.2.x? If so, there really isn't a way  
to do this in that series.


If they are using 1.3 (in some pre-release form), then there are two  
options:


1. they could use the sequential mapper by specifying "-mca rmaps  
seq". This mapper takes a hostfile and maps one process to each  
entry, in rank order. So they could specify that we only map to half  
of the actual number of cores on a particular node


2. they could use the rank_file mapper that allows you to specify  
what cores are to be used by what rank. I am less familiar with this  
option and there isn't a lot of documentation on how to use it - but  
you may have to provide a fairly comprehensive map file since your  
nodes are not all the same.


I have been asked by some other folks to provide a mapping option "-- 
stride x" that would cause the default round-robin mapper to step  
across the specified number of slots. So a stride of 2 would  
automatically cause byslot mapping to increment by 2 instead of the  
current stride of 1. I doubt that will be in 1.3.0, but it will show  
up in later releases.


Ralph


On Oct 30, 2008, at 7:46 AM, Brock Palen wrote:


Any thoughts on this?

We are looking writing a script that parses $PBS_NODEFILE to create  
a machinefile and using -machinefile


When we do that though we have to disable tm to avoid an error (-mca  
pls ^tm) this is far from preferable.


Any ideas to tell mpirun to only launch on half the cpus given to it  
by PBS, but each cpu must have adjacent to it another cpu in the  
same node?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 25, 2008, at 5:36 PM, Brock Palen wrote:

We have a user with a code that uses threaded solvers inside each  
MPI rank.  They would like to run two threads per process.


The question is how to launch this?  The default -byslot puts all  
the processes on the first sets of cpus not leaving any cpus for  
the second thread for each process.  And half the cpus are wasted.


The -bynode option works in theory, if all our nodes had the same  
number of core (they do not).


So right now the user did:

#PBS -l nodes=22:ppn=2
export OMP_NUM_THREADS=2
mpirun -np 22 app

Which made me aware of the problem.

How can I basically tell OMPI that a 'slot'  is two cores on the  
same machine?This needs to work inside out torque based  
queueing system.


Sorry If I was not clear about my goal.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Mixed Threaded MPI code, how to launch?

2008-10-30 Thread Reuti

Am 30.10.2008 um 14:46 schrieb Brock Palen:


Any thoughts on this?

We are looking writing a script that parses $PBS_NODEFILE to create  
a machinefile and using -machinefile


When we do that though we have to disable tm to avoid an error (- 
mca pls ^tm) this is far from preferable.


What about redefining the variable $PBS_NODEFILE pointing to an  
adjusted copy of the original file? With this, you could even use the  
TM startup of the nodes, as mpirun would use the adjusted file AFAICS.


When you know, that you request always 2 cores per node, the startup  
of any threads is up to you on your own. As you got two cores, it's  
safe.


-- Reuti


Any ideas to tell mpirun to only launch on half the cpus given to  
it by PBS, but each cpu must have adjacent to it another cpu in the  
same node?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 25, 2008, at 5:36 PM, Brock Palen wrote:

We have a user with a code that uses threaded solvers inside each  
MPI rank.  They would like to run two threads per process.


The question is how to launch this?  The default -byslot puts all  
the processes on the first sets of cpus not leaving any cpus for  
the second thread for each process.  And half the cpus are wasted.


The -bynode option works in theory, if all our nodes had the same  
number of core (they do not).


So right now the user did:

#PBS -l nodes=22:ppn=2
export OMP_NUM_THREADS=2
mpirun -np 22 app

Which made me aware of the problem.

How can I basically tell OMPI that a 'slot'  is two cores on the  
same machine?This needs to work inside out torque based  
queueing system.


Sorry If I was not clear about my goal.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Mixed Threaded MPI code, how to launch?

2008-10-30 Thread Ralph Castain


On Oct 30, 2008, at 8:13 AM, Reuti wrote:


Am 30.10.2008 um 14:46 schrieb Brock Palen:


Any thoughts on this?

We are looking writing a script that parses $PBS_NODEFILE to create  
a machinefile and using -machinefile


When we do that though we have to disable tm to avoid an error (- 
mca pls ^tm) this is far from preferable.


What about redefining the variable $PBS_NODEFILE pointing to an  
adjusted copy of the original file? With this, you could even use  
the TM startup of the nodes, as mpirun would use the adjusted file  
AFAICS.


Probably won't work. The problem is that TM doesn't launch based on  
node name - it launches based on a TM-defined "launchid". This is  
computed based on the location of the slot in the list in the  
PBS_NODEFILE.


So if you mess with the nodefile, there is no guarantee that the  
launchid we compute when reading the file will match what Torque  
thinks it assigned. This has been fixed in 1.3, but remains a  
constraint in 1.2


Ralph




When you know, that you request always 2 cores per node, the startup  
of any threads is up to you on your own. As you got two cores, it's  
safe.


-- Reuti


Any ideas to tell mpirun to only launch on half the cpus given to  
it by PBS, but each cpu must have adjacent to it another cpu in the  
same node?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 25, 2008, at 5:36 PM, Brock Palen wrote:

We have a user with a code that uses threaded solvers inside each  
MPI rank.  They would like to run two threads per process.


The question is how to launch this?  The default -byslot puts all  
the processes on the first sets of cpus not leaving any cpus for  
the second thread for each process.  And half the cpus are wasted.


The -bynode option works in theory, if all our nodes had the same  
number of core (they do not).


So right now the user did:

#PBS -l nodes=22:ppn=2
export OMP_NUM_THREADS=2
mpirun -np 22 app

Which made me aware of the problem.

How can I basically tell OMPI that a 'slot'  is two cores on the  
same machine?This needs to work inside out torque based  
queueing system.


Sorry If I was not clear about my goal.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Equivalent .h files

2008-10-30 Thread Benjamin Lamptey
Hello,
I am new at using open-mpi and will like to know something basic.

What is the equivalent of the "mpif.h" in open-mpi which is normally
"included" at
the beginning of mpi codes (fortran in this case).

I shall appreciate that for cpp as well.

Thanks
Ben


Re: [OMPI users] Equivalent .h files

2008-10-30 Thread Brock Palen

If your using fortran 90 the mpi module is best:


use mpi

If 77 (or don't have a working module)

include 'mpif.h'

Just like any other MPI library.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 30, 2008, at 10:33 AM, Benjamin Lamptey wrote:


Hello,
I am new at using open-mpi and will like to know something basic.

What is the equivalent of the "mpif.h" in open-mpi which is  
normally "included" at

the beginning of mpi codes (fortran in this case).

I shall appreciate that for cpp as well.

Thanks
Ben
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Mixed Threaded MPI code, how to launch?

2008-10-30 Thread Brock Palen

Yes I never made it to my mailbox.  Strange, (wink wink, ahh email).

Thanks for letting me know about it, I have the message now.

as for using 1.3 prerelease, that is not really an option right now  
for us.  I think we can get by with 1.2 without threads or do some  
hacking (ppn=largest number we have  launch with -bynode).

TIll a 1.3 stable is out.

Thanks, new features for launching look really neat.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 30, 2008, at 10:12 AM, Ralph Castain wrote:

I believe I answered much of this the other day - did it get lost  
in the email?


As for using TM with a hostfile - this is an unfortunately bug in  
the 1.2 series. You can't - you'll have to move to 1.3 to do so.  
When you do, note the changed handling of hostfiles as specified on  
the wiki:


https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan

Ralph


I take it this is using OMPI 1.2.x? If so, there really isn't a  
way to do this in that series.


If they are using 1.3 (in some pre-release form), then there are  
two options:


1. they could use the sequential mapper by specifying "-mca rmaps  
seq". This mapper takes a hostfile and maps one process to each  
entry, in rank order. So they could specify that we only map to  
half of the actual number of cores on a particular node


2. they could use the rank_file mapper that allows you to specify  
what cores are to be used by what rank. I am less familiar with  
this option and there isn't a lot of documentation on how to use  
it - but you may have to provide a fairly comprehensive map file  
since your nodes are not all the same.


I have been asked by some other folks to provide a mapping option  
"--stride x" that would cause the default round-robin mapper to  
step across the specified number of slots. So a stride of 2 would  
automatically cause byslot mapping to increment by 2 instead of  
the current stride of 1. I doubt that will be in 1.3.0, but it  
will show up in later releases.


Ralph


On Oct 30, 2008, at 7:46 AM, Brock Palen wrote:


Any thoughts on this?

We are looking writing a script that parses $PBS_NODEFILE to  
create a machinefile and using -machinefile


When we do that though we have to disable tm to avoid an error (- 
mca pls ^tm) this is far from preferable.


Any ideas to tell mpirun to only launch on half the cpus given to  
it by PBS, but each cpu must have adjacent to it another cpu in  
the same node?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 25, 2008, at 5:36 PM, Brock Palen wrote:

We have a user with a code that uses threaded solvers inside each  
MPI rank.  They would like to run two threads per process.


The question is how to launch this?  The default -byslot puts all  
the processes on the first sets of cpus not leaving any cpus for  
the second thread for each process.  And half the cpus are wasted.


The -bynode option works in theory, if all our nodes had the same  
number of core (they do not).


So right now the user did:

#PBS -l nodes=22:ppn=2
export OMP_NUM_THREADS=2
mpirun -np 22 app

Which made me aware of the problem.

How can I basically tell OMPI that a 'slot'  is two cores on the  
same machine?This needs to work inside out torque based  
queueing system.


Sorry If I was not clear about my goal.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] Equivalent .h files

2008-10-30 Thread Eugene Loh

Benjamin Lamptey wrote:


Hello,
I am new at using open-mpi and will like to know something basic.

What is the equivalent of the "mpif.h" in open-mpi which is normally 
"included" at

the beginning of mpi codes (fortran in this case).

I shall appreciate that for cpp as well.


For Fortran:

 INCLUDE "mpif.h"

For C:

#include 

If you use the compiler wrappers (mpif90, mpicc, etc.), they should be 
able to find the include files.


MPI source codes are *supposed* to run unchanged as you move them from 
one MPI to another, even if the procedures for compiling and launching 
those codes vary.


P.S.  It's "OpenMPI" rather than "open-mpi".  :^)


Re: [OMPI users] Equivalent .h files

2008-10-30 Thread Gus Correa


Benjamin Lamptey wrote:


Hello,
I am new at using open-mpi and will like to know something basic.

What is the equivalent of the "mpif.h" in open-mpi which is normally 
"included" at

the beginning of mpi codes (fortran in this case).


Hello Benjamin and List

As far as I know, it is just the same old "mpif.h".  :)

Make sure you point to the right (OpenMPI) mpif77 or mpif90 wrapper when 
you compile

your code, so as to get the correct mpif.h.
Use a full path name, if needed.
A common mistake is to inadvertently use another wrapper (say from 
MPICH, or LAM),
since some Linux distributions come with tons of MPI versions, compiler 
wrappers, and include files.


For most cases mpif77/mpif90 will do the pre-compiler phase too,
if your source file names have the correct suffix (.F or .F90, instead 
of .f or .f90).

Using the mpif77/mpif90 compiler wrappers is probably your best choice.
However, if your Makefile has a separate pre-compiler phase with cpp,
you need to add the corresponding "-I/my/path/to/openmpi/include" clause 
to it,

otherwise it may not find the correct mpif.h file and use a wrong one.

Works for me.
I hope it helps you.

Gus Correa

--
-
Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
-



I shall appreciate that for cpp as well.

Thanks
Ben



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Equivalent .h files

2008-10-30 Thread Benjamin Lamptey
Brock,
I am using the g95 compiler on Mac OS X.

I had
include 'mpif.h'

and I got the message "could not open mpif.h".

At your suggestion, I have added
USE mpi
include 'mpif.h'

I get the message "Can't open module file 'mpi.mod'

What am I doing wrong?
Thanks
Ben

On Thu, Oct 30, 2008 at 2:38 PM, Brock Palen  wrote:

> If your using fortran 90 the mpi module is best:
>
>
> use mpi
>
> If 77 (or don't have a working module)
>
> include 'mpif.h'
>
> Just like any other MPI library.
>
> Brock Palen
> www.umich.edu/~brockp 
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
>
>
>
>
> On Oct 30, 2008, at 10:33 AM, Benjamin Lamptey wrote:
>
>  Hello,
>> I am new at using open-mpi and will like to know something basic.
>>
>> What is the equivalent of the "mpif.h" in open-mpi which is normally
>> "included" at
>> the beginning of mpi codes (fortran in this case).
>>
>> I shall appreciate that for cpp as well.
>>
>> Thanks
>> Ben
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Equivalent .h files

2008-10-30 Thread Brock Palen
If you are using the module  'use mpi'  then don't have "include  
'mpif.h'"


use only one of those.

Make sure you use 'mpif90'  to compile. Also make sure to read the  
other reply on this list. about include paths for headers and modules.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 30, 2008, at 11:11 AM, Benjamin Lamptey wrote:


Brock,
I am using the g95 compiler on Mac OS X.

I had
include 'mpif.h'

and I got the message "could not open mpif.h".

At your suggestion, I have added
USE mpi
include 'mpif.h'

I get the message "Can't open module file 'mpi.mod'

What am I doing wrong?
Thanks
Ben

On Thu, Oct 30, 2008 at 2:38 PM, Brock Palen  wrote:
If your using fortran 90 the mpi module is best:


use mpi

If 77 (or don't have a working module)

include 'mpif.h'

Just like any other MPI library.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985




On Oct 30, 2008, at 10:33 AM, Benjamin Lamptey wrote:

Hello,
I am new at using open-mpi and will like to know something basic.

What is the equivalent of the "mpif.h" in open-mpi which is  
normally "included" at

the beginning of mpi codes (fortran in this case).

I shall appreciate that for cpp as well.

Thanks
Ben
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Mixed Threaded MPI code, how to launch?

2008-10-30 Thread Ralph Castain

Brock

I have a patch for 1.2 that will allow hostfiles to work with TM. If  
it would help, I can send it to you off-list.


Ralph

On Oct 30, 2008, at 8:40 AM, Brock Palen wrote:


Yes I never made it to my mailbox.  Strange, (wink wink, ahh email).

Thanks for letting me know about it, I have the message now.

as for using 1.3 prerelease, that is not really an option right now  
for us.  I think we can get by with 1.2 without threads or do some  
hacking (ppn=largest number we have  launch with -bynode).

TIll a 1.3 stable is out.

Thanks, new features for launching look really neat.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 30, 2008, at 10:12 AM, Ralph Castain wrote:

I believe I answered much of this the other day - did it get lost  
in the email?


As for using TM with a hostfile - this is an unfortunately bug in  
the 1.2 series. You can't - you'll have to move to 1.3 to do so.  
When you do, note the changed handling of hostfiles as specified on  
the wiki:


https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan

Ralph


I take it this is using OMPI 1.2.x? If so, there really isn't a  
way to do this in that series.


If they are using 1.3 (in some pre-release form), then there are  
two options:


1. they could use the sequential mapper by specifying "-mca rmaps  
seq". This mapper takes a hostfile and maps one process to each  
entry, in rank order. So they could specify that we only map to  
half of the actual number of cores on a particular node


2. they could use the rank_file mapper that allows you to specify  
what cores are to be used by what rank. I am less familiar with  
this option and there isn't a lot of documentation on how to use  
it - but you may have to provide a fairly comprehensive map file  
since your nodes are not all the same.


I have been asked by some other folks to provide a mapping option  
"--stride x" that would cause the default round-robin mapper to  
step across the specified number of slots. So a stride of 2 would  
automatically cause byslot mapping to increment by 2 instead of  
the current stride of 1. I doubt that will be in 1.3.0, but it  
will show up in later releases.


Ralph


On Oct 30, 2008, at 7:46 AM, Brock Palen wrote:


Any thoughts on this?

We are looking writing a script that parses $PBS_NODEFILE to  
create a machinefile and using -machinefile


When we do that though we have to disable tm to avoid an error (- 
mca pls ^tm) this is far from preferable.


Any ideas to tell mpirun to only launch on half the cpus given to  
it by PBS, but each cpu must have adjacent to it another cpu in  
the same node?


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Oct 25, 2008, at 5:36 PM, Brock Palen wrote:

We have a user with a code that uses threaded solvers inside each  
MPI rank.  They would like to run two threads per process.


The question is how to launch this?  The default -byslot puts all  
the processes on the first sets of cpus not leaving any cpus for  
the second thread for each process.  And half the cpus are wasted.


The -bynode option works in theory, if all our nodes had the same  
number of core (they do not).


So right now the user did:

#PBS -l nodes=22:ppn=2
export OMP_NUM_THREADS=2
mpirun -np 22 app

Which made me aware of the problem.

How can I basically tell OMPI that a 'slot'  is two cores on the  
same machine?This needs to work inside out torque based  
queueing system.


Sorry If I was not clear about my goal.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Fwd: Problems installing in Cygwin

2008-10-30 Thread Jeff Squyres

On Oct 30, 2008, at 9:18 AM, Gustavo Seabra wrote:

Understood; this was a more complete/precise meaning for my  
question "Is

there any other OS where
dlsym() is present by RTLD_NEXT is not?"  I suppose we can extend the
configure test to check for RTLD_NEXT as well.  In this way, that  
component
won't even decide to build itself.  You won't need this component,  
because
it's only really useful for the OpenFabrics and [ancient] Myricom  
GM drivers

in Open MPI, neither of which are likely to be supported in Cygwin.


That should be good enough, at least for that part. Or testing first
for the presence of OpenFabrics or Myricom? Maybe it could just test
for the existence of GNU extensions? I don't know. I understand it
must be really hard to keep track of what is standard and what is not
these days. I'm just thankful that you guys are looking into it.
Thanks!


My plate is pretty full trying to get v1.3.0 out the door and prepare  
for SC -- I don't know if this will be fixed before then.  But I've  
opened a ticket about it:


https://svn.open-mpi.org/trac/ompi/ticket/1618


Well... I don't. It's just that, due to specifics of my work, I need
to work on a Windows computer, but I also like to use many unix
features / commands. So, I just use Cygwin out of convenience, which
in a way gives me the best of both worlds without the need to dual
boot.


Fair enough.


However, the other reason I use Cygwin is because I work in the
development of a program and it is very convenient to do that in
Cygwin, especially when I'm traveling and only have access to my
laptop. Many users have this program running in Cygwin, so it's also
good to have a place to test it. I don't really use Cygwin for the
long "production" runs that would actually require a MPI, for that I
have access to local clusters or Teragrid. My problem is testing the
parallel version in Cygwin (or if any changes made break the parallel
implementation) because I still did not manage to install a MPI in
Cygwin.

In fact, I have never tried a VM :-$ I guess I should give it a try
sometime. Do you have any recommendations? My only requirements are
that (i) it works, (ii) it's free.


I don't know if there are free VM's or not, but you could try a 30 day  
free trial of vmware (or equivalent) and see if you like it.  IIRC,  
it's not terribly expensive if you end up liking it.  :-)


FWIW: I use Parallels (an OS X VM) on my Mac because Cisco lives and  
dies by Outlook calendaring.  :-)


--
Jeff Squyres
Cisco Systems



[OMPI users] Issues with MPI_Type_create_darray

2008-10-30 Thread Antonio Molins

Hi all,

I am having some trouble with this function. I want to map data to a  
2x2 block-cyclic configuration in C, using the code:


MPI_Barrier(blacs_comm);
// size of each matrix
int *array_of_gsizes = new int[2];
array_of_gsizes[0]=this->nx;
array_of_gsizes[1]=this->ny;
// block-cyclic distritution used by ScaLAPACK
int *array_of_distrs = new int[2];
array_of_distrs[0]=MPI_DISTRIBUTE_CYCLIC;
array_of_distrs[1]=MPI_DISTRIBUTE_CYCLIC;
int *array_of_dargs = new int[2];
array_of_dargs[0]=BLOCK_SIZE;
array_of_dargs[1]=BLOCK_SIZE;
int *array_of_psizes = new int[2];
array_of_psizes[0]=Pr;
array_of_psizes[1]=Pc;
int rank = pc+pr*Pc;
	MPI_Type_create_darray(Pr*Pc,rank, 
2,array_of_gsizes,array_of_distrs,array_of_dargs,


array_of_psizes,MPI_ORDER_C,MPI_DOUBLE,&this->datatype);
MPI_Type_commit(&this->datatype);
int typesize;
long typeextent;
MPI_Type_size(this->datatype,&typesize);
MPI_Type_extent(this->datatype,&typeextent);
	printf("type size for process rank (%d,%d) is %d doubles, type extent  
is %d doubles (up to %d).",pr,pc,typesize/(int)sizeof(double),(int) 
(typeextent/sizeof(double)),nx*ny);
	MPI_File_open(blacs_comm,(char*)filename, MPI_MODE_RDWR,  
MPI_INFO_NULL, &this->fid);
	MPI_File_set_view(this->fid,this->offset 
+i*nx*ny*sizeof(double),MPI_DOUBLE,this- 
>datatype,"native",MPI_INFO_NULL);	



This works well when used like this, but problem is that the matrix  
itself is written in disk column-major fashion, so I would want to use  
the code as if I was reading it transposed, that is:


MPI_Barrier(blacs_comm);
// size of each matrix
int *array_of_gsizes = new int[2];
array_of_gsizes[0]=this->ny;
array_of_gsizes[1]=this->nx;
// block-cyclic distritution used by ScaLAPACK
int *array_of_distrs = new int[2];
array_of_distrs[0]=MPI_DISTRIBUTE_CYCLIC;
array_of_distrs[1]=MPI_DISTRIBUTE_CYCLIC;
int *array_of_dargs = new int[2];
array_of_dargs[0]=BLOCK_SIZE;
array_of_dargs[1]=BLOCK_SIZE;
int *array_of_psizes = new int[2];
array_of_psizes[0]=Pr;
array_of_psizes[1]=Pc;
int rank = pr+pc*Pr;
	MPI_Type_create_darray(Pr*Pc,rank, 
2,array_of_gsizes,array_of_distrs,array_of_dargs,


array_of_psizes,MPI_ORDER_C,MPI_DOUBLE,&this->datatype);
MPI_Type_commit(&this->datatype);
MPI_Type_size(this->datatype,&typesize);
MPI_Type_extent(this->datatype,&typeextent);
	printf("type size for process rank (%d,%d) is %d doubles, type extent  
is %d doubles (up to %d).",pr,pc,typesize/(int)sizeof(double),(int) 
(typeextent/sizeof(double)),nx*ny);
	MPI_File_open(blacs_comm,(char*)filename, MPI_MODE_RDWR,  
MPI_INFO_NULL, &this->fid);
	MPI_File_set_view(this->fid,this->offset 
+i*nx*ny*sizeof(double),MPI_DOUBLE,this- 
>datatype,"native",MPI_INFO_NULL);	


To my surprise, this code crashes while calling MPI_File_set_view()!!!  
And before you ask, I did try switching MPI_ORDER_C to  
MPI_ORDER_FORTRAN, I got the same results I am reporting here.


Also, I am quite intrigued by the text output of each of these  
programs: the first one will report:


	type size for process rank (0,0) is 32 doubles, type extent is 91  
doubles (up to 91).
	type size for process rank (1,0) is 20 doubles, type extent is 119  
doubles (up to 91).
	type size for process rank (0,1) is 24 doubles, type extent is 95  
doubles (up to 91).
	type size for process rank (1,1) is 15 doubles, type extent is 123  
doubles (up to 91).


Anybody know why the extents are not equal???

Even weirder, the second one will report:

	type size for process rank (0,0) is 32 doubles, type extent is 91  
doubles (up to 91).
	type size for process rank (1,0) is 20 doubles, type extent is 95  
doubles (up to 91).
	type size for process rank (0,1) is 24 doubles, type extent is 143  
doubles (up to 91).
	type size for process rank (1,1) is 15 doubles, type extent is 147  
doubles (up to 91).


The extent changed! I think this is somehow related to the posterior  
crash of MPI_File_set_view(), but that's as far as I can understand...


Any clue about what is happening? I attach the trace below.

Best,
A


Antonio Molins, PhD Candidate
Medical Engineering and Medical Physics
Harvard - MIT Division of Health Sciences and Technology
--
"When a traveler reaches a fork in the road,
the ℓ1 -norm tells him to take either one way or the other,
but the ℓ2 -norm instructs him to head off into the bushes. "

John F. Claerbout and Francis Muir, 1973


*** glibc detected *** doubl