[OMPI users] default current working directory of started application

2009-04-16 Thread Jerome BENOIT

Hello List,

in FAQ Running MPI jobs, point 12, we read:

-wdir : Set the working directory of the started applications.
If not supplied, the current working directory is assumed
(or $HOME, if the current working directory does not exist on all nodes).

Is there a way to configure the default alternative assumed current directory (here $HOME) ? 


Thanks in advance,
Jerome


[OMPI users] MPI_Comm_spawn and oreted

2009-04-16 Thread Jerome BENOIT

Hello List,

I have just noticed that, when MPI_Comm_spawn is used to launch programs around,
oreted working directory on the nodes is the working directory of the spawnning 
program:
can we ask to oreted to use an another directory ?

Thanks in advance,
Jerome 


Re: [OMPI users] MPI_Comm_spawn and oreted

2009-04-16 Thread Jerome BENOIT

Hello Again,

Jerome BENOIT wrote:

Hello List,

I have just noticed that, when MPI_Comm_spawn is used to launch programs 
around,
oreted working directory on the nodes is the working directory of the 
spawnning program:

can we ask to oreted to use an another directory ?


Changing the working the directory via chdir before spawning with MPI_Comm_spawn
changes nothing: the oreted working directory on the nodes seems to be imposed
by something else. As run OMPI on top of SLURM, I guess this is related to 
SLURM.

Jerome



Thanks in advance,
Jerome ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] An mpirun question

2009-04-16 Thread Min Zhu
Dear all,

 

I wonder if you could help me with this question.

I have got 3 Linux servers with 8 processors on each server. If I want
to run a job using mpirun command

and specify the number of processors to be used on each server. Is there
any way to do this? At the moment,

I find that I can only issue such command "mpirun -np 14 -host cfd1,cfd2
./wrf.exe' which means the mpirun

Will run the job using 7 processors on each server cfd1 and cfd2. Can I
specify say using 8 processors on cfd1 and

6 processors on cfd2? I ask this question because I found that the
different combination of processors on those

Servers can influence the computation time dramatically. Thank you very
much in advance,

 

Cheers,

 

Min Zhu


CONFIDENTIALITY NOTICE: This e-mail, including any attachments, contains 
information that may be confidential, and is protected by copyright. It is 
directed to the intended recipient(s) only.  If you have received this e-mail 
in error please e-mail the sender by replying to this message, and then delete 
the e-mail. Unauthorised disclosure, publication, copying or use of this e-mail 
is prohibited.  Any communication of a personal nature in this e-mail is not 
made by or on behalf of any RES group company. E-mails sent or received may be 
monitored to ensure compliance with the law, regulation and/or our policies.

Re: [OMPI users] MPI_Comm_spawn and oreted

2009-04-16 Thread Jerome BENOIT

Hi !

finally I got it:
passing the mca key/value `"plm_slurm_args"/"--chdir /local/folder"' does the 
trick.

As a matter of fact, my code pass the MPI_Info key/value 
`"wdir"/"/local/folder"'
to MPI_Comm_spawn as well: the working directories on the nodes of the spawned 
programs
are `nodes:/local/folder' as expected, but the working directory of the oreted_s
is the working directory of the parent program. My guess is that the MPI_Info 
key/vale
may also be passed to `srun'.

hth,
Jerome



Jerome BENOIT wrote:

Hello Again,

Jerome BENOIT wrote:

Hello List,

I have just noticed that, when MPI_Comm_spawn is used to launch 
programs around,
oreted working directory on the nodes is the working directory of the 
spawnning program:

can we ask to oreted to use an another directory ?


Changing the working the directory via chdir before spawning with 
MPI_Comm_spawn
changes nothing: the oreted working directory on the nodes seems to be 
imposed
by something else. As run OMPI on top of SLURM, I guess this is related 
to SLURM.


Jerome



Thanks in advance,
Jerome ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] An mpirun question

2009-04-16 Thread Terry Frankcombe
Hi Min Zhu

You need to read about hostfiles and bynode/byslot scheduling.  See
here:


Ciao


On Thu, 2009-04-16 at 10:43 +0100, Min Zhu wrote:
> Dear all,
> 
>  
> 
> I wonder if you could help me with this question.
> 
> I have got 3 Linux servers with 8 processors on each server. If I want
> to run a job using mpirun command
> 
> and specify the number of processors to be used on each server. Is
> there any way to do this? At the moment,
> 
> I find that I can only issue such command “mpirun –np 14 –host
> cfd1,cfd2 ./wrf.exe’ which means the mpirun
> 
> Will run the job using 7 processors on each server cfd1 and cfd2. Can
> I specify say using 8 processors on cfd1 and
> 
> 6 processors on cfd2? I ask this question because I found that the
> different combination of processors on those
> 
> Servers can influence the computation time dramatically. Thank you
> very much in advance,
> 
>  
> 
> Cheers,
> 
>  
> 
> Min Zhu
> 
> 
> 
> CONFIDENTIALITY NOTICE: This e-mail, including any attachments,
> contains information that may be confidential, and is protected by
> copyright. It is directed to the intended recipient(s) only. If you
> have received this e-mail in error please e-mail the sender by
> replying to this message, and then delete the e-mail. Unauthorised
> disclosure, publication, copying or use of this e-mail is prohibited.
> Any communication of a personal nature in this e-mail is not made by
> or on behalf of any RES group company. E-mails sent or received may be
> monitored to ensure compliance with the law, regulation and/or our
> policies.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] libnuma issue

2009-04-16 Thread Francesco Pietra
Did not work the way I implemented the suggestion.

./configure CC=/..cce..icc CXX/. cce..icpc F77=/..fce..ifort
FC=/..fce..ifort --with-libnuma=/usr --prefix=/usr --enable-static

./configure CC=/..cce..icc CXX/. cce..icpc F77=/..fce..ifort
FC=/..fce..ifort --with-libnuma=/usr --prefix=/usr
then editing Makefile by adding "LDFLAGS = -static-intel"

./configure CC=/..cce..icc CXX/. cce..icpc F77=/..fce..ifort
FC=/..fce..ifort --with-libnuma=/usr --prefix=/usr
then editing Makefile by replacing "LDFLAGS" with "LDFLAGS = -static-intel"

In all 3 cases orterun error: libimf.so not found (the library was
sourced with the *.sh intel scripts)

francesco

On Thu, Apr 16, 2009 at 4:43 AM, Nysal Jan  wrote:
> You could try statically linking the Intel-provided libraries. Use
> LDFLAGS=-static-intel
>
> --Nysal
>
> On Wed, 2009-04-15 at 21:03 +0200, Francesco Pietra wrote:
>> On Wed, Apr 15, 2009 at 8:39 PM, Prentice Bisbal  wrote:
>> > Francesco Pietra wrote:
>> >> I used --with-libnuma=/usr since Prentice Bisbal's suggestion and it
>> >> worked. Unfortunately, I found no way to fix the failure in finding
>> >> libimf.so when compiling openmpi-1.3.1 with intels, as you have seen
>> >> in other e-mail from me. And gnu compilers (which work well with both
>> >> openmpi and the slower code of my application) are defeated by the
>> >> faster code of my application. With limited hardware resources, I must
>> >> rely on that 40% speeding up.
>> >>
>> >
>> > To fix the libimf.so problem you need to include the path to Intel's
>> > libimf.so in your LD_LIBRARY_PATH environment variable. On my system, I
>> > installed v11.074 of the Intel compilers in /usr/local/intel, so my
>> > libimf.so file is located here:
>> >
>> > /usr/local/intel/Compiler/11.0/074/lib/intel64/libimf.so
>> >
>> > So I just add that to my LD_LIBRARY_PATH:
>> >
>> > LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/074/lib/intel64:$LD_LIBRARY_PATH
>> > export LD_LIBRARY_PATH
>>
>> Just a clarification: With my system I use the latest intels version
>> 10, 10.1.2.024, and mkl 10.1.2.024 because it proved difficult to make
>> a debian package with version 11. At
>>
>> echo $LD_LIBRARY_PATH
>>
>> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:opt/intel/fce/10.1.022/lib:/usr/local/lib
>>
>> (that /lib contains libimf.so)
>>
>> That results from sourcing in my .bashrc:
>>
>> . /opt/intel/fce/10.1.022/bin/ifortvars.sh
>> . /opt/intel/cce/10.1.022/bin/iccvars.sh
>>
>>  Did you suppress that sourcing before exporting the LD_EXPORT_PATH to
>> the library at issue? Having so much turned around the proble, it is
>> not unlikely that I am messing myself.
>>
>> thanks
>> francesco
>>
>>
>> >
>> > Now I can run whatever programs need libimf.so without any problems. In
>> > your case, you'll want to that before your make command.
>> >
>> > Here's exactly what I use to compile OpenMPI with the Intel Compilers:
>> >
>> > export PATH=/usr/local/intel/Compiler/11.0/074/bin/intel64:$PATH
>> >
>> > export
>> > LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/074/lib/intel64:$LD_LIBRARY_PATH
>> >
>> > ../configure CC=icc CXX=icpc F77=ifort FC=ifort
>> > --prefix=/usr/local/openmpi-1.2.8/intel-11/x86_64 --disable-ipv6
>> > --with-sge --with-openib --enable-static
>> >
>> > --
>> > Prentice
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



[OMPI users] Intel compiler libraries (was: libnuma issue)

2009-04-16 Thread Jeff Squyres

I believe that Nysal was referring to

  ./configure CC=icc CXX=icpc F77=ifort FC=ifort LDFLAGS=-static- 
intel --prefix=/usr


This method makes editing your shell startup files unnecessary for  
running on remote nodes, but you'll still need those files sourced for  
interactive use of the intel compilers and/or for running intel- 
compiler-generated executables locally.


I'm guessing that you're not sourcing the intel .sh files for non- 
interactive logins.  You'll need to check your shell startup files and  
ensure that those sourcing lines are executed when you login to remote  
nodes non-interactively.  E.g.:


  thisnode$ ssh othernode env | sort

shows the relevant stuff in your environment on the other node.  Note  
that this is different than


  thisnode$ ssh othernode
  othernode$ env | sort




On Apr 16, 2009, at 8:56 AM, Francesco Pietra wrote:


Did not work the way I implemented the suggestion.

./configure CC=/..cce..icc CXX/. cce..icpc F77=/..fce..ifort
FC=/..fce..ifort --with-libnuma=/usr --prefix=/usr --enable-static

./configure CC=/..cce..icc CXX/. cce..icpc F77=/..fce..ifort
FC=/..fce..ifort --with-libnuma=/usr --prefix=/usr
then editing Makefile by adding "LDFLAGS = -static-intel"

./configure CC=/..cce..icc CXX/. cce..icpc F77=/..fce..ifort
FC=/..fce..ifort --with-libnuma=/usr --prefix=/usr
then editing Makefile by replacing "LDFLAGS" with "LDFLAGS = -static- 
intel"


In all 3 cases orterun error: libimf.so not found (the library was
sourced with the *.sh intel scripts)

francesco

On Thu, Apr 16, 2009 at 4:43 AM, Nysal Jan  wrote:
> You could try statically linking the Intel-provided libraries. Use
> LDFLAGS=-static-intel
>
> --Nysal
>
> On Wed, 2009-04-15 at 21:03 +0200, Francesco Pietra wrote:
>> On Wed, Apr 15, 2009 at 8:39 PM, Prentice Bisbal  
 wrote:

>> > Francesco Pietra wrote:
>> >> I used --with-libnuma=/usr since Prentice Bisbal's suggestion  
and it
>> >> worked. Unfortunately, I found no way to fix the failure in  
finding
>> >> libimf.so when compiling openmpi-1.3.1 with intels, as you  
have seen
>> >> in other e-mail from me. And gnu compilers (which work well  
with both
>> >> openmpi and the slower code of my application) are defeated by  
the
>> >> faster code of my application. With limited hardware  
resources, I must

>> >> rely on that 40% speeding up.
>> >>
>> >
>> > To fix the libimf.so problem you need to include the path to  
Intel's
>> > libimf.so in your LD_LIBRARY_PATH environment variable. On my  
system, I
>> > installed v11.074 of the Intel compilers in /usr/local/intel,  
so my

>> > libimf.so file is located here:
>> >
>> > /usr/local/intel/Compiler/11.0/074/lib/intel64/libimf.so
>> >
>> > So I just add that to my LD_LIBRARY_PATH:
>> >
>> > LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/074/lib/ 
intel64:$LD_LIBRARY_PATH

>> > export LD_LIBRARY_PATH
>>
>> Just a clarification: With my system I use the latest intels  
version
>> 10, 10.1.2.024, and mkl 10.1.2.024 because it proved difficult to  
make

>> a debian package with version 11. At
>>
>> echo $LD_LIBRARY_PATH
>>
>> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/ 
lib:opt/intel/fce/10.1.022/lib:/usr/local/lib

>>
>> (that /lib contains libimf.so)
>>
>> That results from sourcing in my .bashrc:
>>
>> . /opt/intel/fce/10.1.022/bin/ifortvars.sh
>> . /opt/intel/cce/10.1.022/bin/iccvars.sh
>>
>>  Did you suppress that sourcing before exporting the  
LD_EXPORT_PATH to
>> the library at issue? Having so much turned around the proble, it  
is

>> not unlikely that I am messing myself.
>>
>> thanks
>> francesco
>>
>>
>> >
>> > Now I can run whatever programs need libimf.so without any  
problems. In

>> > your case, you'll want to that before your make command.
>> >
>> > Here's exactly what I use to compile OpenMPI with the Intel  
Compilers:

>> >
>> > export PATH=/usr/local/intel/Compiler/11.0/074/bin/intel64:$PATH
>> >
>> > export
>> > LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/074/lib/ 
intel64:$LD_LIBRARY_PATH

>> >
>> > ../configure CC=icc CXX=icpc F77=ifort FC=ifort
>> > --prefix=/usr/local/openmpi-1.2.8/intel-11/x86_64 --disable-ipv6
>> > --with-sge --with-openib --enable-static
>> >
>> > --
>> > Prentice
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] default current working directory of started application

2009-04-16 Thread Ralph Castain
Not currently - could be done if there is a strong enough reason to do  
so. Generally, though, the -wdir option seems to do the same thing. Is  
there a problem with it, or some need it doesn't satisfy?



On Apr 15, 2009, at 11:00 PM, Jerome BENOIT wrote:


Hello List,

in FAQ Running MPI jobs, point 12, we read:

-wdir : Set the working directory of the started  
applications.

If not supplied, the current working directory is assumed
(or $HOME, if the current working directory does not exist on all  
nodes).


Is there a way to configure the default alternative assumed current  
directory (here $HOME) ?

Thanks in advance,
Jerome
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] default current working directory of started application

2009-04-16 Thread Jerome BENOIT

Hi !

thanks for the reply.

On a homeless workers cluster when the workers programs are spawned via 
MPI_Comm_spaw{,multiple},
it would be nice to set up a default alternative assumed current which is local 
(rather than global
as $HOME) via a local configuration file: you can play with the wdir in 
MPI_Info only
if the cluster is homogeneous enough.

hth,
Jerome

Ralph Castain wrote:
Not currently - could be done if there is a strong enough reason to do 
so. Generally, though, the -wdir option seems to do the same thing. Is 
there a problem with it, or some need it doesn't satisfy?



On Apr 15, 2009, at 11:00 PM, Jerome BENOIT wrote:


Hello List,

in FAQ Running MPI jobs, point 12, we read:

-wdir : Set the working directory of the started applications.
If not supplied, the current working directory is assumed
(or $HOME, if the current working directory does not exist on all nodes).

Is there a way to configure the default alternative assumed current 
directory (here $HOME) ?

Thanks in advance,
Jerome
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







Re: [OMPI users] default current working directory of started application

2009-04-16 Thread Ralph Castain
Hokay, I can see that. Are you looking for an mca param that  
specifies a file that might contain config info we should read when  
starting up the orted? What would this local configuration file look  
like (e.g., what kind of config directives would you need/want), would  
you provide it on the node where mpirun is or would it be on every  
remote node, etc?


All things are doable - the devil is in defining the details. :-)

On Apr 16, 2009, at 8:23 AM, Jerome BENOIT wrote:


Hi !

thanks for the reply.

On a homeless workers cluster when the workers programs are spawned  
via MPI_Comm_spaw{,multiple},
it would be nice to set up a default alternative assumed current  
which is local (rather than global
as $HOME) via a local configuration file: you can play with the wdir  
in MPI_Info only

if the cluster is homogeneous enough.

hth,
Jerome

Ralph Castain wrote:
Not currently - could be done if there is a strong enough reason to  
do so. Generally, though, the -wdir option seems to do the same  
thing. Is there a problem with it, or some need it doesn't satisfy?

On Apr 15, 2009, at 11:00 PM, Jerome BENOIT wrote:

Hello List,

in FAQ Running MPI jobs, point 12, we read:

-wdir : Set the working directory of the started  
applications.

If not supplied, the current working directory is assumed
(or $HOME, if the current working directory does not exist on all  
nodes).


Is there a way to configure the default alternative assumed  
current directory (here $HOME) ?

Thanks in advance,
Jerome
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MPI_Comm_spawn and oreted

2009-04-16 Thread Ralph Castain
The orteds don't pass anything from MPI_Info to srun during a  
comm_spawn. What the orteds do is to chdir to the specified wdir  
before spawning the child process to ensure that the child has the  
correct working directory, then the orted changes back to its default  
working directory.


The orted working directory is set by the base environment. So your  
slurm arguments cause *all* orteds to use the specified directory as  
their "home base". They will then use any given wdir keyval when they  
launch their respective child processes, as described above.


As a side note, it isn't clear to me why you care about the orted's  
working directory. The orteds don't write anything there, or do  
anything with respect to their "home base" - so why would this matter?  
Or are you trying to specify the executable's path relative to where  
the orted is sitting?



On Apr 16, 2009, at 4:02 AM, Jerome BENOIT wrote:


Hi !

finally I got it:
passing the mca key/value `"plm_slurm_args"/"--chdir /local/folder"'  
does the trick.


As a matter of fact, my code pass the MPI_Info key/value `"wdir"/"/ 
local/folder"'
to MPI_Comm_spawn as well: the working directories on the nodes of  
the spawned programs
are `nodes:/local/folder' as expected, but the working directory of  
the oreted_s
is the working directory of the parent program. My guess is that the  
MPI_Info key/vale

may also be passed to `srun'.

hth,
Jerome



Jerome BENOIT wrote:

Hello Again,
Jerome BENOIT wrote:

Hello List,

I have just noticed that, when MPI_Comm_spawn is used to launch  
programs around,
oreted working directory on the nodes is the working directory of  
the spawnning program:

can we ask to oreted to use an another directory ?
Changing the working the directory via chdir before spawning with  
MPI_Comm_spawn
changes nothing: the oreted working directory on the nodes seems to  
be imposed
by something else. As run OMPI on top of SLURM, I guess this is  
related to SLURM.

Jerome


Thanks in advance,
Jerome ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MPI_Comm_spawn and oreted

2009-04-16 Thread Jerome BENOIT

Hi,

thanks for the reply.

Ralph Castain wrote:
The orteds don't pass anything from MPI_Info to srun during a 
comm_spawn. What the orteds do is to chdir to the specified wdir before 
spawning the child process to ensure that the child has the correct 
working directory, then the orted changes back to its default working 
directory.


The orted working directory is set by the base environment. So your 
slurm arguments cause *all* orteds to use the specified directory as 
their "home base". They will then use any given wdir keyval when they 
launch their respective child processes, as described above.


As a side note, it isn't clear to me why you care about the orted's 
working directory. The orteds don't write anything there, or do anything 
with respect to their "home base" - so why would this matter? Or are you 
trying to specify the executable's path relative to where the orted is 
sitting?



Let be specific. My worker nodes are homeless: the /home directory is 
automounted
(when needed) from the master node: orteds dont write anything, but they keep 
it mounted !
The idea is to avoid this by specifying a local working directory.

Jerome






On Apr 16, 2009, at 4:02 AM, Jerome BENOIT wrote:


Hi !

finally I got it:
passing the mca key/value `"plm_slurm_args"/"--chdir /local/folder"' 
does the trick.


As a matter of fact, my code pass the MPI_Info key/value 
`"wdir"/"/local/folder"'
to MPI_Comm_spawn as well: the working directories on the nodes of the 
spawned programs
are `nodes:/local/folder' as expected, but the working directory of 
the oreted_s
is the working directory of the parent program. My guess is that the 
MPI_Info key/vale

may also be passed to `srun'.

hth,
Jerome



Jerome BENOIT wrote:

Hello Again,
Jerome BENOIT wrote:

Hello List,

I have just noticed that, when MPI_Comm_spawn is used to launch 
programs around,
oreted working directory on the nodes is the working directory of 
the spawnning program:

can we ask to oreted to use an another directory ?
Changing the working the directory via chdir before spawning with 
MPI_Comm_spawn
changes nothing: the oreted working directory on the nodes seems to 
be imposed
by something else. As run OMPI on top of SLURM, I guess this is 
related to SLURM.

Jerome


Thanks in advance,
Jerome ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] default current working directory of started application

2009-04-16 Thread Jerome BENOIT

Hi Again,



Ralph Castain wrote:
Hokay, I can see that. Are you looking for an mca param that 
specifies a file that might contain config info we should read when 
starting up the orted? 


an mca param sound appropriate.
Here, so far I can understand, orted is not involved: on my SLURM cluster, I 
added as
regular user in my local `~/.openmpi/mca-params.conf' file the line
`plm_slurm_args = --chdir=/local/', where the /local _s are
local directories. This is for orted part.

For the wdir of the spawned programs, unless specified through MPI_Info,
the last assumed default is $HOME: this is the default that should be 
configurable.

By the way, it would be nice to be able to use some tokens of the form %T (see 
sshd_config)
in the mca-params.conf . For my previous example, with sshd_config convention:

plm_slurm_args = --chdir=/local/%u

in the system wide configuration file `/etc/openmpi/openmpi-mca-params.conf' ,
or something as

plm_slurm_args = --chdir=$LOCALDIR

hth,
Jerome




What would this local configuration file look 
like (e.g., what kind of config directives would you need/want), would 
you provide it on the node where mpirun is or would it be on every 
remote node, etc?


All things are doable - the devil is in defining the details. :-)

On Apr 16, 2009, at 8:23 AM, Jerome BENOIT wrote:


Hi !

thanks for the reply.

On a homeless workers cluster when the workers programs are spawned 
via MPI_Comm_spaw{,multiple},
it would be nice to set up a default alternative assumed current which 
is local (rather than global
as $HOME) via a local configuration file: you can play with the wdir 
in MPI_Info only

if the cluster is homogeneous enough.

hth,
Jerome

Ralph Castain wrote:
Not currently - could be done if there is a strong enough reason to 
do so. Generally, though, the -wdir option seems to do the same 
thing. Is there a problem with it, or some need it doesn't satisfy?

On Apr 15, 2009, at 11:00 PM, Jerome BENOIT wrote:

Hello List,

in FAQ Running MPI jobs, point 12, we read:

-wdir : Set the working directory of the started 
applications.

If not supplied, the current working directory is assumed
(or $HOME, if the current working directory does not exist on all 
nodes).


Is there a way to configure the default alternative assumed current 
directory (here $HOME) ?

Thanks in advance,
Jerome
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] MPI_Comm_spawn and oreted

2009-04-16 Thread Ralph Castain

Thanks! That does indeed help clarify.

You should also then configure OMPI with --disable-per-user-config- 
files. MPI procs will automatically look at the default MCA parameter  
file, which is probably on your master node (wherever mpirun was  
executed). However, they also look at the user's home directory for  
any user default param file and/or binary modules. So the home  
directory will again be automounted, this time by the MPI procs.


We created that option specifically to address the problem you  
describe. Hope it helps.



On Apr 16, 2009, at 8:57 AM, Jerome BENOIT wrote:


Hi,

thanks for the reply.

Ralph Castain wrote:
The orteds don't pass anything from MPI_Info to srun during a  
comm_spawn. What the orteds do is to chdir to the specified wdir  
before spawning the child process to ensure that the child has the  
correct working directory, then the orted changes back to its  
default working directory.
The orted working directory is set by the base environment. So your  
slurm arguments cause *all* orteds to use the specified directory  
as their "home base". They will then use any given wdir keyval when  
they launch their respective child processes, as described above.
As a side note, it isn't clear to me why you care about the orted's  
working directory. The orteds don't write anything there, or do  
anything with respect to their "home base" - so why would this  
matter? Or are you trying to specify the executable's path relative  
to where the orted is sitting?



Let be specific. My worker nodes are homeless: the /home directory  
is automounted
(when needed) from the master node: orteds dont write anything, but  
they keep it mounted !

The idea is to avoid this by specifying a local working directory.

Jerome




On Apr 16, 2009, at 4:02 AM, Jerome BENOIT wrote:

Hi !

finally I got it:
passing the mca key/value `"plm_slurm_args"/"--chdir /local/ 
folder"' does the trick.


As a matter of fact, my code pass the MPI_Info key/value `"wdir"/"/ 
local/folder"'
to MPI_Comm_spawn as well: the working directories on the nodes of  
the spawned programs
are `nodes:/local/folder' as expected, but the working directory  
of the oreted_s
is the working directory of the parent program. My guess is that  
the MPI_Info key/vale

may also be passed to `srun'.

hth,
Jerome



Jerome BENOIT wrote:

Hello Again,
Jerome BENOIT wrote:

Hello List,

I have just noticed that, when MPI_Comm_spawn is used to launch  
programs around,
oreted working directory on the nodes is the working directory  
of the spawnning program:

can we ask to oreted to use an another directory ?
Changing the working the directory via chdir before spawning with  
MPI_Comm_spawn
changes nothing: the oreted working directory on the nodes seems  
to be imposed
by something else. As run OMPI on top of SLURM, I guess this is  
related to SLURM.

Jerome


Thanks in advance,
Jerome ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Intel compiler libraries (was: libnuma issue)

2009-04-16 Thread Francesco Pietra
On Thu, Apr 16, 2009 at 3:04 PM, Jeff Squyres  wrote:
> I believe that Nysal was referring to
>
>  ./configure CC=icc CXX=icpc F77=ifort FC=ifort LDFLAGS=-static-intel
> --prefix=/usr

I have completely removed openmpi-1.2.3 and reinstalled in /usr/local
from source on a Tyan S2895.

>From my .bashrc:

#For intel Fortran and C/C++ compilers

. /opt/intel/fce/10.1.022/bin/ifortvars.sh
. /opt/intel/cce/10.1.022/bin/iccvars.sh

#For openmpi

if [ "$LD_LIBRARY_PATH" ] ; then
   export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"
else
   export LD_LIBRARY_PATH="/usr/local/lib"
fi

===
francesco@tya64:~$ echo $PATH
/opt/intel/cce/10.1.022/bin:/opt/intel/fce/10.1.022/bin:/usr/local/bin/vmd:/usr/local/chimera/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/home/francesco/hole2/exe:/usr/local/amber9/exe
francesco@tya64:~$

francesco@tya64:~$ echo $LD_LIBRARY_PATH
/opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:/opt/intel/fce/10.1.022/lib:/usr/local/lib
francesco@tya64:~$

francesco@tya64:~$ ssh 192.168.1.33 env | sort
HOME=/home/francesco
LANG=en_US.UTF-8
LOGNAME=francesco
MAIL=/var/mail/francesco
PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
PWD=/home/francesco
SHELL=/bin/bash
SHLVL=1
SSH_CLIENT=192.168.1.37 33941 22
SSH_CONNECTION=192.168.1.37 33941 192.168.1.33 22
USER=francesco
_=/usr/bin/env
francesco@tya64:~$

where 192.168.1.33 is my remote desktop in the internal network and am
launching ssh from the Tyan where openmpi has been jiust installed
(also works if i do toward another parallel computer)
==
francesco@tya64:~$ ssh 192.168.1.37 date
Thu Apr 16 17:12:38 CEST 2009
francesco@tya64:~$

where 192.168.1.37 is the Tyan computer when I am doing with openmpi;
i.e., date passwordless shows that this computer knows also itself, as
it is true for all other computers on the internal network.
===

Now with openmpi-1.3.1:

francesco@tya64:/usr/local/openmpi-1.3.1$ ./configure
CC=/opt/intel/cce/10.1.022/bin/icc
CXX=/opt/intel/cce/10.1.022/bin/icpc
F77=/opt/intel/fce/10.1.022/bin/ifort
FC=/opt/intel/fce/10.1.022/bin/ifort LDFLAGS=-static-intel
--with-libnuma=/usr --prefix=/usr/local

no warnings

# make all install

no warnings

($ and # mean user and superuser, resp)

with the connectity_c test, again the orte error: libimf.so not found.

Please notice that I am not new to openmpi. I have worked for more
than a a couple of years without any problem on these same machines
with versions 1.2.3 and 1.2.6. With the latter, when i upgraded from
debian amd64 etch to the new stable amd64 lenny, amber was still
parallelized nicely. Then I changed disks of the raid1 to larger ones,
tried to recover the previous installations of codes and found these
broken on the new OS installation. Then, everything non parallelized
was easily fixed, while with openmpi-1.3.1 (i upgraded to this) the
issues described.

As far as I have tested the OS is in order, and ssh, as shown above,
has no problem.

Given my inexperience as system analyzer, I assume that I am messing
something. Unfortunately, i was unable to discover where I am messing.
An editor is waiting completion of calculations requested by a
referee, and I am unable to answer.

thanks a lot for all you have tried to put me on the right road

francesco


>
> This method makes editing your shell startup files unnecessary for running
> on remote nodes, but you'll still need those files sourced for interactive
> use of the intel compilers and/or for running intel-compiler-generated
> executables locally.
>
> I'm guessing that you're not sourcing the intel .sh files for
> non-interactive logins.  You'll need to check your shell startup files and
> ensure that those sourcing lines are executed when you login to remote nodes
> non-interactively.  E.g.:
>
>  thisnode$ ssh othernode env | sort
>
> shows the relevant stuff in your environment on the other node.  Note that
> this is different than
>
>  thisnode$ ssh othernode
>  othernode$ env | sort
>
>
>
>
> On Apr 16, 2009, at 8:56 AM, Francesco Pietra wrote:
>
>> Did not work the way I implemented the suggestion.
>>
>> ./configure CC=/..cce..icc CXX/. cce..icpc F77=/..fce..ifort
>> FC=/..fce..ifort --with-libnuma=/usr --prefix=/usr --enable-static
>>
>> ./configure CC=/..cce..icc CXX/. cce..icpc F77=/..fce..ifort
>> FC=/..fce..ifort --with-libnuma=/usr --prefix=/usr
>> then editing Makefile by adding "LDFLAGS = -static-intel"
>>
>> ./configure CC=/..cce..icc CXX/. cce..icpc F77=/..fce..ifort
>> FC=/..fce..ifort --with-libnuma=/usr --prefix=/usr
>> then editing Makefile by replacing "LDFLAGS" with "LDFLAGS =
>> -static-intel"
>>
>> In all 3 cases orterun error: libimf.so not found (the library was
>> sourced with the *.sh intel scripts)
>>
>> francesco
>>
>> On Thu, Apr 16, 2009 at 4:43 AM, Nysal Jan  wrote:
>> > You could try statically linking the Intel-provided libraries. Use
>> > LDFLAGS=-static-intel
>> >
>> > --Nysal
>> >
>> > On Wed, 2009-04-15 at 

Re: [OMPI users] Intel compiler libraries (was: libnuma issue)

2009-04-16 Thread Jeff Squyres

On Apr 16, 2009, at 11:29 AM, Francesco Pietra wrote:


francesco@tya64:~$ ssh 192.168.1.33 env | sort
HOME=/home/francesco
LANG=en_US.UTF-8
LOGNAME=francesco
MAIL=/var/mail/francesco
PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
PWD=/home/francesco
SHELL=/bin/bash
SHLVL=1
SSH_CLIENT=192.168.1.37 33941 22
SSH_CONNECTION=192.168.1.37 33941 192.168.1.33 22
USER=francesco
_=/usr/bin/env
francesco@tya64:~$



I don't see the intel compiler variables set in there, nor an  
LD_LIBRARY_PATH indicating where the intel libraries are located.  See  
my text from the last mail:



> I'm guessing that you're not sourcing the intel .sh files for
> non-interactive logins.  You'll need to check your shell startup  
files and
> ensure that those sourcing lines are executed when you login to  
remote nodes

> non-interactively.  E.g.:
>
>  thisnode$ ssh othernode env | sort
>
> shows the relevant stuff in your environment on the other node.   
Note that

> this is different than
>
>  thisnode$ ssh othernode
>  othernode$ env | sort



You might well have some logic in your .bashrc that quits before fully  
executing when running non-interactive logins; hence, the ". /opt/ 
intel/fce/10.1.022/bin/ifortvars.sh" lines don't execute on the  
192.168.1.33 machine when you run non-interactive jobs.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Intel compiler libraries (was: libnuma issue)

2009-04-16 Thread Douglas Guptill
On Thu, Apr 16, 2009 at 05:29:14PM +0200, Francesco Pietra wrote:
> On Thu, Apr 16, 2009 at 3:04 PM, Jeff Squyres  wrote:
...
> Given my inexperience as system analyzer, I assume that I am messing
> something. Unfortunately, i was unable to discover where I am messing.
> An editor is waiting completion of calculations requested by a
> referee, and I am unable to answer.
> 
> thanks a lot for all you have tried to put me on the right road

I wonder if the confusion stems from the requirement to "source" the
intel compiler setup files in (at least) two situations:
  1. when compiling the (MPI) application
  2. when running the (MPI) application

My solution to the second has been to create - as part of the build
process for my application - a "run" script for it.  That script
sources the intel setup files, then runs the application.

Here is part of the script that runs my application:

==
# If it is defined, source the intel setup script.
#
if test "x/opt/intel/Compiler/11.0/074/bin/ifortvars.sh intel64" != x ; then
echo "setup the intel compiler with 
"
. /opt/intel/Compiler/11.0/074/bin/ifortvars.sh intel64
if test -z $(echo ${LD_LIBRARY_PATH} | grep intel) ; then
echo "Don't see intel in LD_LIBRARY_PATH=<${LD_LIBRARY_PATH}>"
echo "you may have trouble"
fi
fi
...
# run my program
==

I am running only on the 4 cores of one machine, so this solution may
not work for MPI applications that run on multiple machines.

Hope that helps,
Douglas.


Re: [OMPI users] Intel compiler libraries (was: libnuma issue)

2009-04-16 Thread Francesco Pietra
As a quick answer before I go to study yous and Douglas mail: the
desktop toward which I was sshing is a single processor 10-years old
desktop with nothing about intel or openmpi. It only has ssh software
to run amber procedures on remote machines. I can try your suggested
ssh toward a parallel computer, but, in any case, my running of
openmpi and Amber is normally on single-node uma-type computers.
Nothing outside them is involved in the parallel calculation. It is
the situation described by Douglas. The computation should work (and
it worked in the past) even if the computer is removed from the
internal network.
francesco

On Thu, Apr 16, 2009 at 5:37 PM, Jeff Squyres  wrote:
> On Apr 16, 2009, at 11:29 AM, Francesco Pietra wrote:
>
>> francesco@tya64:~$ ssh 192.168.1.33 env | sort
>> HOME=/home/francesco
>> LANG=en_US.UTF-8
>> LOGNAME=francesco
>> MAIL=/var/mail/francesco
>> PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
>> PWD=/home/francesco
>> SHELL=/bin/bash
>> SHLVL=1
>> SSH_CLIENT=192.168.1.37 33941 22
>> SSH_CONNECTION=192.168.1.37 33941 192.168.1.33 22
>> USER=francesco
>> _=/usr/bin/env
>> francesco@tya64:~$
>>
>
> I don't see the intel compiler variables set in there, nor an
> LD_LIBRARY_PATH indicating where the intel libraries are located.  See my
> text from the last mail:
>
>> > I'm guessing that you're not sourcing the intel .sh files for
>> > non-interactive logins.  You'll need to check your shell startup files
>> > and
>> > ensure that those sourcing lines are executed when you login to remote
>> > nodes
>> > non-interactively.  E.g.:
>> >
>> >  thisnode$ ssh othernode env | sort
>> >
>> > shows the relevant stuff in your environment on the other node.  Note
>> > that
>> > this is different than
>> >
>> >  thisnode$ ssh othernode
>> >  othernode$ env | sort
>>
>
> You might well have some logic in your .bashrc that quits before fully
> executing when running non-interactive logins; hence, the ".
> /opt/intel/fce/10.1.022/bin/ifortvars.sh" lines don't execute on the
> 192.168.1.33 machine when you run non-interactive jobs.
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Intel compiler libraries (was: libnuma issue)

2009-04-16 Thread Francesco Pietra
On Thu, Apr 16, 2009 at 5:37 PM, Jeff Squyres  wrote:
> On Apr 16, 2009, at 11:29 AM, Francesco Pietra wrote:
>
>> francesco@tya64:~$ ssh 192.168.1.33 env | sort
>> HOME=/home/francesco
>> LANG=en_US.UTF-8
>> LOGNAME=francesco
>> MAIL=/var/mail/francesco
>> PATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
>> PWD=/home/francesco
>> SHELL=/bin/bash
>> SHLVL=1
>> SSH_CLIENT=192.168.1.37 33941 22
>> SSH_CONNECTION=192.168.1.37 33941 192.168.1.33 22
>> USER=francesco
>> _=/usr/bin/env
>> francesco@tya64:~$
>>
>
> I don't see the intel compiler variables set in there, nor an
> LD_LIBRARY_PATH indicating where the intel libraries are located.  See my
> text from the last mail:
>
>> > I'm guessing that you're not sourcing the intel .sh files for
>> > non-interactive logins.  You'll need to check your shell startup files
>> > and
>> > ensure that those sourcing lines are executed when you login to remote
>> > nodes
>> > non-interactively.  E.g.:
>> >
>> >  thisnode$ ssh othernode env | sort
>> >
>> > shows the relevant stuff in your environment on the other node.  Note
>> > that
>> > this is different than
>> >
>> >  thisnode$ ssh othernode
>> >  othernode$ env | sort
>>
>
> You might well have some logic in your .bashrc that quits before fully
> executing when running non-interactive logins; hence, the ".
> /opt/intel/fce/10.1.022/bin/ifortvars.sh" lines don't execute on the
> 192.168.1.33 machine when you run non-interactive jobs.

That is true, horrendously true. While the other parallel machine to
which I can slogin passwordless responds correctly to "env | sort"
from its keyboard, it does not so when "thisnode$ ssh othernode env |
sort" from another computer. The answer is the same as from the
non-parallel desktop. I can't say if "thisnode$ ssh othernode env |
sort" worked well before, or if it is relevant as to the way I intend
to use these computers singly. At any event, I would like to fix the
connection. Is that described in howto setting up ssh?
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] MPI_Comm_spawn and oreted

2009-04-16 Thread Jerome BENOIT

Thanks for the info.

meanwhile I have set:

mpi_param_check = 0

in my system-wide configuation file on workers

and 


mpi_param_check = 1

on the master.

Jerome


Ralph Castain wrote:

Thanks! That does indeed help clarify.

You should also then configure OMPI with 
--disable-per-user-config-files. MPI procs will automatically look at 
the default MCA parameter file, which is probably on your master node 
(wherever mpirun was executed). However, they also look at the user's 
home directory for any user default param file and/or binary modules. So 
the home directory will again be automounted, this time by the MPI procs.


We created that option specifically to address the problem you describe. 
Hope it helps.



On Apr 16, 2009, at 8:57 AM, Jerome BENOIT wrote:


Hi,

thanks for the reply.

Ralph Castain wrote:
The orteds don't pass anything from MPI_Info to srun during a 
comm_spawn. What the orteds do is to chdir to the specified wdir 
before spawning the child process to ensure that the child has the 
correct working directory, then the orted changes back to its default 
working directory.
The orted working directory is set by the base environment. So your 
slurm arguments cause *all* orteds to use the specified directory as 
their "home base". They will then use any given wdir keyval when they 
launch their respective child processes, as described above.
As a side note, it isn't clear to me why you care about the orted's 
working directory. The orteds don't write anything there, or do 
anything with respect to their "home base" - so why would this 
matter? Or are you trying to specify the executable's path relative 
to where the orted is sitting?



Let be specific. My worker nodes are homeless: the /home directory is 
automounted
(when needed) from the master node: orteds dont write anything, but 
they keep it mounted !

The idea is to avoid this by specifying a local working directory.

Jerome




On Apr 16, 2009, at 4:02 AM, Jerome BENOIT wrote:

Hi !

finally I got it:
passing the mca key/value `"plm_slurm_args"/"--chdir /local/folder"' 
does the trick.


As a matter of fact, my code pass the MPI_Info key/value 
`"wdir"/"/local/folder"'
to MPI_Comm_spawn as well: the working directories on the nodes of 
the spawned programs
are `nodes:/local/folder' as expected, but the working directory of 
the oreted_s
is the working directory of the parent program. My guess is that the 
MPI_Info key/vale

may also be passed to `srun'.

hth,
Jerome



Jerome BENOIT wrote:

Hello Again,
Jerome BENOIT wrote:

Hello List,

I have just noticed that, when MPI_Comm_spawn is used to launch 
programs around,
oreted working directory on the nodes is the working directory of 
the spawnning program:

can we ask to oreted to use an another directory ?
Changing the working the directory via chdir before spawning with 
MPI_Comm_spawn
changes nothing: the oreted working directory on the nodes seems to 
be imposed
by something else. As run OMPI on top of SLURM, I guess this is 
related to SLURM.

Jerome


Thanks in advance,
Jerome ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Debugging memory use of Open MPI

2009-04-16 Thread Eugene Loh

Eugene Loh wrote:


Shaun Jackman wrote:


What's the purpose of the 400 MB that MPI_Init has allocated?


It's for... um, I don't know.  Let's see...

About a third of it appears to be
vt_open() -> VTThrd_open() -> VTGen_open
which I'm guessing is due to the VampirTrace instrumentation (maybe 
allocating the buffers into which the MPI tracing data is collected).  
It seems to go away if one doesn't collect message-tracing data.


Somehow, I can't see further into the library.  Hmm.  It does seem 
like a bunch.  The shared-memory area (which MPI_Init allocates for 
on-node message passing) is much smaller.  The remaining roughly 130 
Mbyte/process seems to be independent of the number of processes.


An interesting exercise for the reader.


Arrgh.  What a pathetic response!  Lemme see if I can do better than that.

As I said, about a "third" (whatever that means) is for vt_open(), and 
I'm pretty sure that's for the VampirTrace message tracing.  If we don't 
collect message traces, that memory isn't allocated.


What's the rest?  I said the shared-memory area is much smaller, but I 
was confused about which OMPI release I was using.  So, the 
shared-memory area was 128 Mbyte and it was getting mapped in once for 
each process, and so it was counted doubly.


Plus, even a "hello world" program seems to have some inexplicably large 
amount of memory (10-20 Mbytes?).


So:

- about 10-20 Mbytes just to start the simplest program up
- other miscellaneous MPI stuff
- 128 Mbyte for the shared-memory area, counted twice
- about 150 Mbyte for VT buffers

Now, another question you might have is why the shared-memory area is so 
big.  The idea is that processes communicate via shared memory by having 
one process write to the shared area and the other read from it.  It can 
be advantageous to provide ample room (e.g., to minimize synchronization 
among processes... otherwise, processes end up having to wait for 
congested resources to clear or to do extra work to avoid the 
congestion).  "Ample" room means ample for lots of data and/or for lots 
of (short) messages.  How much is enough?  No idea.  YMMV.  The more the 
better.  Etc.  Someone picked some numbers and that's what you live with 
by default.  So, why so big?  Answer:  just because we picked it to be 
that way.


Re: [OMPI users] Debugging memory use of Open MPI

2009-04-16 Thread Shaun Jackman

Eugene Loh wrote:
...
What's the rest?  I said the shared-memory area is much smaller, but I 
was confused about which OMPI release I was using.  So, the 
shared-memory area was 128 Mbyte and it was getting mapped in once for 
each process, and so it was counted doubly.


If there are eight processes running on one host, does each process 
allocate one 128 Mbyte shared memory buffer and map in the other seven 
128 Mbyte buffers allocated by the other processes?


Cheers,
Shaun


Re: [OMPI users] Debugging memory use of Open MPI

2009-04-16 Thread Eugene Loh

Shaun Jackman wrote:


Eugene Loh wrote:

What's the rest?  I said the shared-memory area is much smaller, but 
I was confused about which OMPI release I was using.  So, the 
shared-memory area was 128 Mbyte and it was getting mapped in once 
for each process, and so it was counted doubly.


If there are eight processes running on one host, does each process 
allocate one 128 Mbyte shared memory buffer and map in the other seven 
128 Mbyte buffers allocated by the other processes?


No.  The total size for one, single shared file is computed and the 
lowest rank on the node creates the file and mmaps it in.  Then, the 
other processes mmap the same file in.


The code is set up to have different "memory pools".  E.g., look at 
https://svn.open-mpi.org/source/xref/ompi-trunk/ompi/mca/btl/sm/btl_sm.c#sm_btl_first_time_init 
.  So, conceivably you could have different buffers on the same node.  
But in practice it's just one buffer and, in any case, they're always 
created by the lowest rank.  E.g., 
https://svn.open-mpi.org/source/xref/ompi-trunk/ompi/mca/common/sm/common_sm_mmap.c#143 
.