Re: [OMPI users] Hide Abort output

2010-04-07 Thread Yves Caniou
Indeed, it seems that it addresses what I want!

I read the discussions on the MPI Forum list, which is very interesting.
I began to develop a terminaison code before seeing that the use of 
MPI_Abort() should be sufficient. 
But I didn't post anything, since my case is particular: I have iterative 
computations. Thus, I can check if any terminaison message has been received 
at some points (with the async receive at the beginning of the program) -- 
the sending of messages has to be done in a "recursive" way to ensure a 
smaller number of messages exchanged between tasks, because there's 
not "multicast" way of sending something.

In my case, I don't need special ending requirements if tasks share files, 
etc., which is not the general case of the standardization of an API.
But I still think that an MPI_Quit() would be very usefull.

Thank you very much!

.Yves.

Le Tuesday 06 April 2010 22:40:29 Jeff Squyres, vous avez écrit :
> BTW, we diverged quite a bit on this thread -- Yves -- does the
> functionality that was fixed by Ralph address your original issue?
>
> On Apr 2, 2010, at 10:21 AM, Ralph Castain wrote:
> > Testing found that I had missed a spot here, so we weren't fully
> > suppressing messages (including MPI_Abort). So the corrected fix is in
> > r22926, and will be included in tonight's tarball.
> >
> > I also made --quiet be a new MCA param orte_execute_quiet so you can put
> > it in your environment instead of only on the cmd line.
> >
> > HTH
> > Ralph
> >
> > On Apr 2, 2010, at 1:18 AM, Ralph Castain wrote:
> > > Actually, a cmd line option to mpirun already existed for this purpose.
> > > Unfortunately, it wasn't properly being respected, so even knowing
> > > about it wouldn't have helped.
> > >
> > > I have fixed this as of r22925 on our developer's trunk and started the
> > > script to generate a fresh nightly tarball. Give it a little time and
> > > then you can find it on the web site:
> > >
> > > http://www.open-mpi.org/nightly/trunk/
> > >
> > > Use the -q or --quiet option and the message will be suppressed. I will
> > > request that this be included in the upcoming 1.4.2 and 1.5.0 releases.
> > >
> > > On Apr 1, 2010, at 8:38 PM, Yves Caniou wrote:
> > >> For information, I use the debian-packaged OpenMPI 1.4.1.
> > >>
> > >> Cheers.
> > >>
> > >> .Yves.
> > >>
> > >> Le Wednesday 31 March 2010 12:41:34 Jeff Squyres (jsquyres), vous avez 
écrit :
> > >>> At present there is no such feature, but it should not be hard to
> > >>> add.
> > >>>
> > >>> Can you guys be a little more specific about exactly what you are
> > >>> seeing and exactly what you want to see?  (And what version you're
> > >>> working with - I'll caveat my discussion that this may be a
> > >>> 1.5-and-forward thing)
> > >>>
> > >>> -jms
> > >>> Sent from my PDA.  No type good.
> > >>>
> > >>> - Original Message -
> > >>> From: users-boun...@open-mpi.org 
> > >>> To: Open MPI Users 
> > >>> Sent: Wed Mar 31 05:38:48 2010
> > >>> Subject: Re: [OMPI users] Hide Abort output
> > >>>
> > >>>
> > >>> I have to say this is a very common issue for our users.  They
> > >>> repeatedly report the long Open MPI MPI_Abort() message in help
> > >>> queries and fail to look for the application error message about the
> > >>> root cause.  A short MPI_Abort() message that said "look elsewhere
> > >>> for the real error message" would be useful.
> > >>>
> > >>> Cheers,
> > >>> David
> > >>>
> > >>> On 03/31/2010 07:58 PM, Yves Caniou wrote:
> >  Dear all,
> > 
> >  I am using the MPI_Abort() command in a MPI program.
> >  I would like to not see the note explaining that the command caused
> >  Open MPI to kill all the jobs and so on.
> >  I thought that I could find a --mca parameter, but couldn't grep it.
> >  The only ones deal with the delay and printing more information (the
> >  stack).
> > 
> >  Is there a mean to avoid the printing of the note (except the
> >  2>/dev/null tips)? Or to delay this printing?
> > 
> >  Thank you.
> > 
> >  .Yves.
> > >>>
> > >>> ___
> > >>> users mailing list
> > >>> us...@open-mpi.org
> > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >>
> > >> --
> > >> Yves Caniou
> > >> Associate Professor at Université Lyon 1,
> > >> Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
> > >> Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
> > >> * in Information Technology Center, The University of Tokyo,
> > >>   2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
> > >>   tel: +81-3-5841-0540
> > >> * in National Institute of Informatics
> > >>   2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
> > >>   tel: +81-3-4212-2412
> > >> http://graal.ens-lyon.fr/~ycaniou/
> > >>
> > >> ___
> > >> users mailing list
> > >> us...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _

[OMPI users] OpenMPI multithreaded performance

2010-04-07 Thread Piero Lanucara


Dear OpenMPI team
hiw much performances we should expect using MPI multithread capability 
(MPI_init_thread in multiple format).
It seems that no performance exist using some simple test like multiple mpi 
channel activated, overlapping comm and computation and so on


Thank's in advance

Piero


Piero LANUCARA
Consorzio interuniversitario per le Applicazioni di Supercalcolo Per Università 
e Ricerca

via dei Tizii n.6b 00185 Roma (Italy)

phone:  06-44486709, fax: 06-4957083
cell:   3403006589
E-mail:piero.lanuc...@caspur.it

Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine

2010-04-07 Thread Serge

> If you run your cmd with the hostfile option and add
> --display-allocation, what does it say?

Thank you, Ralph.

This is the command I used inside my submission script:

  mpirun --display-allocation -np 4 -hostfile hosts ./program

And this is the output I got.

 Data for node: Name: node03  Num slots: 4Max slots: 0
 Data for node: Name: node02  Num slots: 4Max slots: 0
 Data for node: Name: node04  Num slots: 4Max slots: 0
 Data for node: Name: node01  Num slots: 4Max slots: 0

If I run the same mpirun command on the cluster head node "clhead" then 
this is what I get:


 Data for node: Name: clhead  Num slots: 0Max slots: 0
 Data for node: Name: node01  Num slots: 1Max slots: 0
 Data for node: Name: node02  Num slots: 1Max slots: 0
 Data for node: Name: node03  Num slots: 1Max slots: 0
 Data for node: Name: node04  Num slots: 1Max slots: 0

The content of the 'hosts' file:

 node01 slots=1
 node02 slots=1
 node03 slots=1
 node04 slots=1

= Serge


On Apr 6, 2010, at 12:18 PM, Serge wrote:


Hi,

OpenMPI integrates with Sun Grid Engine really well, and one does not 
need to specify any parameters for the mpirun command to launch the 
processes on the compute nodes, that is having in the submission script 
"mpirun ./program" is enough; there is no need for "-np XX" or 
"-hostfile file_name".


However, there are cases when being able to specify the hostfile is 
important (hybrid jobs, users with MPICH jobs, etc.). For example, with 
Grid Engine I can request four 4-core nodes, that is total of 16 slots. 
But I also want to specify how to distribute processes on the nodes, so 
I create the file 'hosts'


node01 slots=1
node02 slots=1
node03 slots=1
node04 slots=1

and modify the line in the submission script to:
mpirun -hostfile hosts ./program

With Open MPI 1.2.x everything worked properly, meaning that Open MPI 
could count the number of slots specified in the 'hosts' file - 4 (i.e. 
effectively supplying the mpirun command with the -np parameter), as 
well as properly distribute processes on the compute nodes (one process 
per host).


It's different with Open MPI 1.4.1. It cannot process the 'hosts' file 
properly at all. All the processes get launched on just one node -- the 
shepherd host.


The format of the 'hosts' file does not matter. It can be, say

node01
node01
node02
node02

meaning 2 slots on each node. Open MPI 1.2.x would handle that with no 
problem, however Open MPI 1.4.x would not.


The problem appears with OMPI 1.4.1, SGE 6.1u6. It was also tested with 
OMPI 1.3.4 and SGE 6.2u4.


It's important to notice that if the mpirun command is run 
interactively, not from inside the Grid Engine script, then it 
interprets the content of the host file just fine.


I am wondering what changed from OMPI 1.2.x to OMPI 1.4.x that prevents 
expected behavior, and is it possible to get it from OMPI 1.4.x by, say, 
tuning some parameters?


= Serge



Re: [OMPI users] OpenMPI multithreaded performance

2010-04-07 Thread Tim Prince

On 4/7/2010 1:20 AM, Piero Lanucara wrote:


Dear OpenMPI team
hiw much performances we should expect using MPI multithread 
capability (MPI_init_thread in multiple format).
It seems that no performance exist using some simple test like 
multiple mpi channel activated, overlapping comm and computation and 
so on


Maybe I don't understand your question.  Are you saying that none of the 
references found by search terms such as "hybrid mpi openmp" are useful 
for you?  They cover so many topics, you would have to be much more 
specific about which topics you want in more detail.


--
Tim Prince



Re: [OMPI users] OpenMPI multithreaded performance

2010-04-07 Thread Piero Lanucara

Thank's for your answer and sorry for misunderstanding.

My question is about performance of this implementation regarding the use 
of multithreading capabilities.


Of course MPI+OpenMP is a choice but (for what I understand) you should 
obtain a lot of performances also using MPI multithreading capabilities 
(for example in the communication part of your code) and without using 
OpenMP.


So, the question is if this is true with the current implementation of 
OpenMPI


Thank's again

Piero



Maybe I don't understand your question.  Are you saying that none of the 
references found by search terms such as "hybrid mpi openmp" are useful for 
you?  They cover so many topics, you would have to be much more specific 
about which topics you want in more detail.


--
Tim Prince

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine

2010-04-07 Thread Dave Love
Serge  writes:

> However, there are cases when being able to specify the hostfile is
> important (hybrid jobs, users with MPICH jobs, etc.).

[I don't understand what MPICH has to do with it.]

> For example,
> with Grid Engine I can request four 4-core nodes, that is total of 16
> slots. But I also want to specify how to distribute processes on the
> nodes, so I create the file 'hosts'
>
> node01 slots=1
> node02 slots=1
> node03 slots=1
> node04 slots=1
>
> and modify the line in the submission script to:
> mpirun -hostfile hosts ./program

Regardless of any open-mpi bug, I'd have thought it was easier just to
use -npernode in that case.  What's the problem with that?  It seems to
me best generally to control the distribution of processes with mpirun
on the SGE-allocated nodes than to concoct host files as we used to do
here, e.g. to get -byslot v. -bynode behaviour (or vice versa).



Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine

2010-04-07 Thread Serge

>> However, there are cases when being able to specify the hostfile is
>> important (hybrid jobs, users with MPICH jobs, etc.).

>[I don't understand what MPICH has to do with it.]

This was just an example of how the different behavior of OMPI 1.4 may 
cause problems. The MPICH library is not the subject of discussion. 
MPICH requires the use of hostfile, which is generated by SGE, and 
having it in the submission for an Open MPI 1.2.x job has an expected 
effect. This is different for Open MPI 1.4.x, which appears not 
interpreting the host file properly.


>> For example,
>> with Grid Engine I can request four 4-core nodes, that is total of 16
>> slots. But I also want to specify how to distribute processes on the
>> nodes, so I create the file 'hosts'
>>
>> node01 slots=1
>> node02 slots=1
>> node03 slots=1
>> node04 slots=1
>>
>> and modify the line in the submission script to:
>> mpirun -hostfile hosts ./program

> Regardless of any open-mpi bug, I'd have thought it was easier just to
> use -npernode in that case. What's the problem with that? It seems to
> me best generally to control the distribution of processes with mpirun
> on the SGE-allocated nodes than to concoct host files as we used to do
> here, e.g. to get -byslot v. -bynode behaviour (or vice versa).

This is exactly what I am doing -- controlling distribution of processes 
with mpirun on the SGE-allocated nodes, by supplying the hostfile. Grid 
Engine allocates nodes and generates a hostfile, which I then can modify 
however I want to, before running the mpirun command. Moreover, it gives 
more control, by allowing to create specific SGE parallel environments, 
where the process distribution is predetermined -- one less worry for 
users playing with mpirun options.


The example in my initial email was deliberately simplified to 
demonstrate the problem.


= Serge


Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine

2010-04-07 Thread Ralph Castain
I should have read your original note more closely and I would have spotted the 
issue. How a hostfile is used changed between OMPI 1.2 and the 1.3 (and above) 
releases per user requests. It was actually the SGE side of the community that 
led the change :-)

You can get a full description of how OMPI uses hostfiles in two ways:

* from the man pages:  man orte_hosts

* from the wiki: https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan

As far as I can tell, OMPI 1.4.x is behaving per that specification. You get 
four slots on your submission script because that is what SGE allocated to you. 
The hostfile filters that when launching, using the provided info to tell it 
how many slots on each node within the allocation to use for that application.

I suggest reading the above documentation to see how OMPI uses hostfiles, and 
then let us know if you have any questions, concerns, or see a deviation from 
the described behavior.

HTH
Ralph

On Apr 7, 2010, at 5:36 AM, Serge wrote:

> > If you run your cmd with the hostfile option and add
> > --display-allocation, what does it say?
> 
> Thank you, Ralph.
> 
> This is the command I used inside my submission script:
> 
>  mpirun --display-allocation -np 4 -hostfile hosts ./program
> 
> And this is the output I got.
> 
> Data for node: Name: node03  Num slots: 4Max slots: 0
> Data for node: Name: node02  Num slots: 4Max slots: 0
> Data for node: Name: node04  Num slots: 4Max slots: 0
> Data for node: Name: node01  Num slots: 4Max slots: 0
> 
> If I run the same mpirun command on the cluster head node "clhead" then this 
> is what I get:
> 
> Data for node: Name: clhead  Num slots: 0Max slots: 0
> Data for node: Name: node01  Num slots: 1Max slots: 0
> Data for node: Name: node02  Num slots: 1Max slots: 0
> Data for node: Name: node03  Num slots: 1Max slots: 0
> Data for node: Name: node04  Num slots: 1Max slots: 0
> 
> The content of the 'hosts' file:
> 
> node01 slots=1
> node02 slots=1
> node03 slots=1
> node04 slots=1
> 
> = Serge
> 
> 
> On Apr 6, 2010, at 12:18 PM, Serge wrote:
> 
>> Hi,
>> OpenMPI integrates with Sun Grid Engine really well, and one does not need 
>> to specify any parameters for the mpirun command to launch the processes on 
>> the compute nodes, that is having in the submission script "mpirun 
>> ./program" is enough; there is no need for "-np XX" or "-hostfile file_name".
>> However, there are cases when being able to specify the hostfile is 
>> important (hybrid jobs, users with MPICH jobs, etc.). For example, with Grid 
>> Engine I can request four 4-core nodes, that is total of 16 slots. But I 
>> also want to specify how to distribute processes on the nodes, so I create 
>> the file 'hosts'
>> node01 slots=1
>> node02 slots=1
>> node03 slots=1
>> node04 slots=1
>> and modify the line in the submission script to:
>> mpirun -hostfile hosts ./program
>> With Open MPI 1.2.x everything worked properly, meaning that Open MPI could 
>> count the number of slots specified in the 'hosts' file - 4 (i.e. 
>> effectively supplying the mpirun command with the -np parameter), as well as 
>> properly distribute processes on the compute nodes (one process per host).
>> It's different with Open MPI 1.4.1. It cannot process the 'hosts' file 
>> properly at all. All the processes get launched on just one node -- the 
>> shepherd host.
>> The format of the 'hosts' file does not matter. It can be, say
>> node01
>> node01
>> node02
>> node02
>> meaning 2 slots on each node. Open MPI 1.2.x would handle that with no 
>> problem, however Open MPI 1.4.x would not.
>> The problem appears with OMPI 1.4.1, SGE 6.1u6. It was also tested with OMPI 
>> 1.3.4 and SGE 6.2u4.
>> It's important to notice that if the mpirun command is run interactively, 
>> not from inside the Grid Engine script, then it interprets the content of 
>> the host file just fine.
>> I am wondering what changed from OMPI 1.2.x to OMPI 1.4.x that prevents 
>> expected behavior, and is it possible to get it from OMPI 1.4.x by, say, 
>> tuning some parameters?
>> = Serge
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Best way to reduce 3D array

2010-04-07 Thread Cole, Derek E
Thanks for the ideas. I did finally end up getting this working by sending back 
to the master process. It's quite ugly, and added a good bit of MPI to the 
code, but it works for now, and I will revisit this later. I am not sure what 
the file system is, I think it is XFS, but I don't know much about why this has 
an effect on the output - just the way files can be opened at once or 
something? 

I did have to end up using an MPI Data type, because this 3D domain was strided 
nicely in X, but not the other dimensions. The domain is larger in Z, so I 
wanted to order my loops such that Z is the innermost. This helped cut down 
some of the MPI overhead. It would have been nice to avoid this, but I could 
not think of the way to do it, and still have all of the computes working on 
the largest section of data possible.

Derek

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ricardo Reis
Sent: Monday, April 05, 2010 3:20 PM
To: Open MPI Users
Subject: Re: [OMPI users] Best way to reduce 3D array

On Mon, 5 Apr 2010, Rob Latham wrote:

> On Tue, Mar 30, 2010 at 11:51:39PM +0100, Ricardo Reis wrote:
>>
>>  If using the master/slace IO model, would it be better to cicle 
>> through all the process and each one would write it's part of the 
>> array into the file. This file would be open in "stream" mode...
>>
>>  like
>>
>>  do p=0,nprocs-1
>>
>>if(my_rank.eq.i)then
>>
>>  openfile (append mode)
>>  write_to_file
>>  closefile
>>
>>endif
>>
>>call MPI_Barrier(world,ierr)
>>
>>  enddo
>
> Note that there's no guarantee of the order here, though. Nothing 
> prevents rank 30 from hitting that loop before rank 2 does.  To ensure

don't they all have to hit the same Barrier? I think that will ensure order in 
this business... or am I being blind to something?

I will agree, though, this is not the best solution to do it. I use this kind 
of arrangment when I'm desperate to do some prinf kind of debugging and want it 
ordered by process. Never had a problem with it.

I mean, I assume there is some sort of sync before the do cycle starts.


  cheers!

  Ricardo Reis

  'Non Serviam'

  PhD candidate @ Lasef
  Computational Fluid Dynamics, High Performance Computing, Turbulence
  http://www.lasef.ist.utl.pt

  Cultural Instigator @ Rádio Zero
  http://www.radiozero.pt

  Keep them Flying! Ajude a/help Aero Fénix!

  http://www.aeronauta.com/aero.fenix

  http://www.flickr.com/photos/rreis/

< sent with alpine 2.00 >



Re: [OMPI users] Problem in remote nodes

2010-04-07 Thread Robert Collyer

Jeff,
In my case, it was the firewall.  It was restricting communication to 
ssh only between the compute nodes.  I appreciate the help.


Rob

Jeff Squyres (jsquyres) wrote:


Those are normal ssh messages, I think - an ssh session may try 
mulktiple auth methods before one succeeds.


You're absolutely sure that there's no firewalling software and 
selinux is disabled?  Ompi is behaving as if it is trying to 
communicate and failing (e.g., its hanging while trying to open some 
tcp sockets back).


Can you open random tcp sockets between your nodes?  (E.g., in non-mpi 
processes)


-jms
Sent from my PDA.  No type good.

- Original Message -
From: users-boun...@open-mpi.org 
To: Open MPI Users 
Sent: Wed Mar 31 06:25:43 2010
Subject: Re: [OMPI users] Problem in remote nodes

I've been checking the /var/log/messages on the compute node and there is
nothing new after executing ' mpirun --host itanium2 -np 2
helloworld.out',
but in the /var/log/messages file on the remote node it appears the
following messages, nothing about unix_chkpwd.

Mar 31 11:56:51 itanium2 sshd(pam_unix)[15349]: authentication failure;
logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=itanium1  user=otro
Mar 31 11:56:53 itanium2 sshd[15349]: Accepted publickey for otro from
192.168.3.1 port 40999 ssh2
Mar 31 11:56:53 itanium2 sshd(pam_unix)[15351]: session opened for user
otro by (uid=500)
Mar 31 11:56:53 itanium2 sshd(pam_unix)[15351]: session closed for 
user otro


It seems that the authentication fails at first, but in the next message
it connects with the node...

El Mar, 30 de Marzo de 2010, 20:02, Robert Collyer escribió:
> I've been having similar problems using Fedora core 9.  I believe the
> issue may be with SELinux, but this is just an educated guess.  In my
> setup, shortly after a login via mpi, there is a notation in the
> /var/log/messages on the compute node as follows:
>
> Mar 30 12:39:45  kernel: type=1400 audit(1269970785.534:588):
> avc:  denied  { read } for  pid=8047 comm="unix_chkpwd" name="hosts"
> dev=dm-0 ino=24579
> scontext=system_u:system_r:system_chkpwd_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:etc_runtime_t:s0 tclass=file
>
> which says SELinux denied unix_chkpwd read access to hosts.
>
> Are you getting anything like this?
>
> In the meantime, I'll check if allowing unix_chkpwd read access to hosts
> eliminates the problem on my system, and if it works, I'll post the
> steps involved.
>
> uriz.49...@e.unavarra.es wrote:
>> I've benn investigating and there is no firewall that could stop TCP
>> traffic in the cluster. With the option --mca plm_base_verbose 30 I get
>> the following output:
>>
>> [itanium1] /home/otro > mpirun --mca plm_base_verbose 30 --host 
itanium2

>> helloworld.out
>> [itanium1:08311] mca: base: components_open: Looking for plm components
>> [itanium1:08311] mca: base: components_open: opening plm components
>> [itanium1:08311] mca: base: components_open: found loaded component rsh
>> [itanium1:08311] mca: base: components_open: component rsh has no
>> register
>> function
>> [itanium1:08311] mca: base: components_open: component rsh open 
function

>> successful
>> [itanium1:08311] mca: base: components_open: found loaded component
>> slurm
>> [itanium1:08311] mca: base: components_open: component slurm has no
>> register function
>> [itanium1:08311] mca: base: components_open: component slurm open
>> function
>> successful
>> [itanium1:08311] mca:base:select: Auto-selecting plm components
>> [itanium1:08311] mca:base:select:(  plm) Querying component [rsh]
>> [itanium1:08311] mca:base:select:(  plm) Query of component [rsh] set
>> priority to 10
>> [itanium1:08311] mca:base:select:(  plm) Querying component [slurm]
>> [itanium1:08311] mca:base:select:(  plm) Skipping component [slurm].
>> Query
>> failed to return a module
>> [itanium1:08311] mca:base:select:(  plm) Selected component [rsh]
>> [itanium1:08311] mca: base: close: component slurm closed
>> [itanium1:08311] mca: base: close: unloading component slurm
>>
>> --Hangs here
>>
>> It seems a slurm problem??
>>
>> Thanks to any idea
>>
>> El Vie, 19 de Marzo de 2010, 17:57, Ralph Castain escribió:
>>
>>> Did you configure OMPI with --enable-debug? You should do this so that
>>> more diagnostic output is available.
>>>
>>> You can also add the following to your cmd line to get more info:
>>>
>>> --debug --debug-daemons --leave-session-attached
>>>
>>> Something is likely blocking proper launch of the daemons and 
processes

>>> so
>>> you aren't getting to the btl's at all.
>>>
>>>
>>> On Mar 19, 2010, at 9:42 AM, uriz.49...@e.unavarra.es wrote:
>>>
>>>
 The processes are running on the remote nodes but they don't give the
 response to the origin node. I don't know why.
 With the option --mca btl_base_verbose 30, I have the same problems
 and
 it
 doesn't show any message.

 Thanks


> On Wed, Mar 17, 2010 at 1:41 PM, Jeff Squyres 
> wrote:
>
>> On Mar 17

Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine

2010-04-07 Thread Serge

Thank you, Ralph.

I have read the wiki and the man pages. But I am still not sure I 
understand what is going on in my example. I cannot filter the slots 
allocated by SGE. I also think that there is a deviation from the 
behavior described on the wiki (precisely example 5 from the top in 
section "NOW RUNNING FROM THE INTERACTIVE SHELL").


So, below, I am copy-pasting my session, and I am asking if you could 
please follow my line of thought and correct me where I am mistaken?


Here I request an interactive session with 16 slots on 4 four-core nodes 
like so:


   $ qrsh -cwd -V -pe ompi* 16 -l h_rt=10:00:00,h_vmem=2G bash

Now, I show that all 16 slots are available and everything is working as 
expected with both OMPI 1.2.9 and OMPI 1.4.1:


   graphics01 $ ~/openmpi/gnu141/bin/mpirun hostname
   [graphics01:24837] ras:gridengine: JOB_ID: 89052
   graphics01
   graphics01
   graphics01
   graphics01
   graphics04
   graphics04
   graphics02
   graphics02
   graphics04
   graphics02
   graphics04
   graphics02
   graphics03
   graphics03
   graphics03
   graphics03

   graphics01 $ ~/openmpi/gnu129/bin/mpirun hostname
   [graphics01:24849] ras:gridengine: JOB_ID: 89052
   graphics01
   graphics04
   graphics02
   graphics03
   graphics01
   graphics04
   graphics02
   graphics03
   graphics01
   graphics03
   graphics01
   graphics04
   graphics03
   graphics04
   graphics02
   graphics02

Now, I want to filter the list of 16 slots by using the host file. I 
want to run 1 process per node.


   graphics01 $ cat hosts
   graphics01 slots=1
   graphics02 slots=1
   graphics03 slots=1
   graphics04 slots=1

And I try to use it with OMPI 1.2.9 and 1.4.1

   graphics01 $ ~/openmpi/gnu129/bin/mpirun -hostfile hosts hostname
   graphics04
   graphics01
   graphics03
   graphics02

   graphics01 $ ~/openmpi/gnu141/bin/mpirun -hostfile hosts hostname
   [graphics01:24903] ras:gridengine: JOB_ID: 89052
   graphics01

So, as you can see OMPI1.4.1 did not recognize any hosts except the 
current shepherd host.


Moreover, similarly to the example down below on 
https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan,

I create two other host files:

   graphics01 $ cat hosts1
   graphics02
   graphics02

   graphics01 $ cat hosts2
   graphics02 slots=2

And then try to use them with both versions of Open MPI:

It works properly with OMPI 1.2.9 (the same way as showed on the wiki!), 
but does NOT with 1.4.1


   graphics01 $ ~/openmpi/gnu129/bin/mpirun -hostfile hosts1 hostname
   graphics02
   graphics02

   graphics01 $ ~/openmpi/gnu129/bin/mpirun -hostfile hosts2 hostname
   graphics02
   graphics02

   graphics01 $ ~/openmpi/gnu141/bin/mpirun -hostfile hosts1 hostname
   [graphics01:25756] ras:gridengine: JOB_ID: 89055

--
   There are no allocated resources for the application
 hostname
   that match the requested mapping:
 hosts1

   Verify that you have mapped the allocated resources properly using the
   --host or --hostfile specification.

--

--
   A daemon (pid unknown) died unexpectedly on signal 1  while 
attempting to

   launch so we are aborting.

= Serge


Ralph Castain wrote:

I should have read your original note more closely and I would have spotted the 
issue. How a hostfile is used changed between OMPI 1.2 and the 1.3 (and above) 
releases per user requests. It was actually the SGE side of the community that 
led the change :-)

You can get a full description of how OMPI uses hostfiles in two ways:

* from the man pages:  man orte_hosts

* from the wiki: https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan

As far as I can tell, OMPI 1.4.x is behaving per that specification. You get 
four slots on your submission script because that is what SGE allocated to you. 
The hostfile filters that when launching, using the provided info to tell it 
how many slots on each node within the allocation to use for that application.

I suggest reading the above documentation to see how OMPI uses hostfiles, and 
then let us know if you have any questions, concerns, or see a deviation from 
the described behavior.

HTH
Ralph

On Apr 7, 2010, at 5:36 AM, Serge wrote:


If you run your cmd with the hostfile option and add
--display-allocation, what does it say?

Thank you, Ralph.

This is the command I used inside my submission script:

 mpirun --display-allocation -np 4 -hostfile hosts ./program

And this is the output I got.

Data for node: Name: node03  Num slots: 4Max slots: 0
Data for node: Name: node02  Num slots: 4Max slots: 0
Data for node: Name: node04  Num slots: 4Max slots: 0
Data for node: Name: node01  Num slots: 4Max slots: 0

If I run the same mpirun command on the cluster head node "clhead" then this is 
what I get:

Data for node: Name: clhead  Num slots: 0Ma

Re: [OMPI users] Best way to reduce 3D array

2010-04-07 Thread Gus Correa

Hi Derek

Cole, Derek E wrote:
Thanks for the ideas. 
I did finally end up getting this working by sending back to 
the master process. It's quite ugly, and added a good bit of 
MPI to the code, but it works for now, 
and I will revisit this later.

Is the MPI code uglier than the OO-stuff you mentioned before? :)

That you parallelized the code is an accomplishment anyway.
Maybe "It works" is the first level of astonishment and
reward one can get from programming, particularly in MPI! :)
Unfortunately, "It is efficient", "It is portable",
"It is easy to change and maintain", etc, seem to come later,
at least in real world conditions.
(OK, throw me eggs and tomatoes ...)

However, your quick description suggests that you cared about the
other items too, using MPI types to make the code more elegant and 
efficient, for instance.


In principle I agree with another posting
(I can't find it now) that advocated careful code design,
from scratch, with a parallel algorithm in mind,
and, whenever possible, taking advantage of quality libraries built
on top of MPI (e.g. PETSc).

However, most of the time we are patching and refurbishing
existing code, particularly when it comes parallelization
(with MPI, OpenMP or other).
At least this is the reality I see in our area here (Earth Sciences).

I would guess in other areas of engineering it is the same.
Most of the time architects are dealing with building maintenance,
then sometimes with building reform, but only rarely they work on the
design of a new building, or not?

I am not sure what the file system is, 
I think it is XFS, but I don't know much about why this 
has an effect on the output - just the way files can be 
opened at once or something? 



I meant parallel (PVFS, etc) versus serial (ext3, xfs, etc)
file systems.
I guess you have XFS one one machine,
mounted over NFS across the cluster.
If you send too many read and write requests you may
overwhelm NFS, at least this is my experience.
By contrast, MPI scales much better with the number of
processes that exchange messages.
Hence, better funnel the data flow through MPI instead,
and let NFS talk to a single process (or to a single process at a time).
For this type of situation the old scheme:
"master reads and data is scattered;
data is gathered and master writes",
works fine, regardless of whether you
may think your code looks ugly or not.
Ricardo Reis suggested another solution, using a loop and MPI_Barrier
to serialize the writes from all processes,
and avoid file contention on NFS.
Another way would be to use MPI-IO.

I did have to end up using an MPI Data type, 
because this 3D domain was strided nicely in X, 
but not the other dimensions. 
The domain is larger in Z, 
so I wanted to order my loops such that Z is the innermost. 
This helped cut down some of the MPI overhead. 
It would have been nice to avoid this, 
but I could not think of the way to do it, 
and still have all of the computes working on the largest =

section of data possible.


> Derek
>

I agree.  The underlying algorithm to some extent dictates how MPI
should be used, and how the data is laid out and distributed.

In the best of the worlds you could devise and develop
an algorithm that is both computationally and MPI (i.e. 
communication-wise) efficient, and simple, and clean, etc.

More often then not one doesn't have the time or support to
do this, right?  The end user seldom cares about it either.
At least this has been my experience here.


Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-




-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ricardo Reis
Sent: Monday, April 05, 2010 3:20 PM
To: Open MPI Users
Subject: Re: [OMPI users] Best way to reduce 3D array

On Mon, 5 Apr 2010, Rob Latham wrote:


On Tue, Mar 30, 2010 at 11:51:39PM +0100, Ricardo Reis wrote:
 If using the master/slace IO model, would it be better to cicle 
through all the process and each one would write it's part of the 
array into the file. This file would be open in "stream" mode...


 like

 do p=0,nprocs-1

   if(my_rank.eq.i)then

 openfile (append mode)
 write_to_file
 closefile

   endif

   call MPI_Barrier(world,ierr)

 enddo
Note that there's no guarantee of the order here, though. Nothing 
prevents rank 30 from hitting that loop before rank 2 does.  To ensure


don't they all have to hit the same Barrier? I think that will ensure order in 
this business... or am I being blind to something?

I will agree, though, this is not the best solution to do it. I use this kind 
of arrangment when I'm desperate to do some prinf kind of debugging and want it 
ordered by process. Never had a problem with it.

I mean, I assume there is some sort of sync before the do cycle starts.