Re: [OMPI users] Hide Abort output
Indeed, it seems that it addresses what I want! I read the discussions on the MPI Forum list, which is very interesting. I began to develop a terminaison code before seeing that the use of MPI_Abort() should be sufficient. But I didn't post anything, since my case is particular: I have iterative computations. Thus, I can check if any terminaison message has been received at some points (with the async receive at the beginning of the program) -- the sending of messages has to be done in a "recursive" way to ensure a smaller number of messages exchanged between tasks, because there's not "multicast" way of sending something. In my case, I don't need special ending requirements if tasks share files, etc., which is not the general case of the standardization of an API. But I still think that an MPI_Quit() would be very usefull. Thank you very much! .Yves. Le Tuesday 06 April 2010 22:40:29 Jeff Squyres, vous avez écrit : > BTW, we diverged quite a bit on this thread -- Yves -- does the > functionality that was fixed by Ralph address your original issue? > > On Apr 2, 2010, at 10:21 AM, Ralph Castain wrote: > > Testing found that I had missed a spot here, so we weren't fully > > suppressing messages (including MPI_Abort). So the corrected fix is in > > r22926, and will be included in tonight's tarball. > > > > I also made --quiet be a new MCA param orte_execute_quiet so you can put > > it in your environment instead of only on the cmd line. > > > > HTH > > Ralph > > > > On Apr 2, 2010, at 1:18 AM, Ralph Castain wrote: > > > Actually, a cmd line option to mpirun already existed for this purpose. > > > Unfortunately, it wasn't properly being respected, so even knowing > > > about it wouldn't have helped. > > > > > > I have fixed this as of r22925 on our developer's trunk and started the > > > script to generate a fresh nightly tarball. Give it a little time and > > > then you can find it on the web site: > > > > > > http://www.open-mpi.org/nightly/trunk/ > > > > > > Use the -q or --quiet option and the message will be suppressed. I will > > > request that this be included in the upcoming 1.4.2 and 1.5.0 releases. > > > > > > On Apr 1, 2010, at 8:38 PM, Yves Caniou wrote: > > >> For information, I use the debian-packaged OpenMPI 1.4.1. > > >> > > >> Cheers. > > >> > > >> .Yves. > > >> > > >> Le Wednesday 31 March 2010 12:41:34 Jeff Squyres (jsquyres), vous avez écrit : > > >>> At present there is no such feature, but it should not be hard to > > >>> add. > > >>> > > >>> Can you guys be a little more specific about exactly what you are > > >>> seeing and exactly what you want to see? (And what version you're > > >>> working with - I'll caveat my discussion that this may be a > > >>> 1.5-and-forward thing) > > >>> > > >>> -jms > > >>> Sent from my PDA. No type good. > > >>> > > >>> - Original Message - > > >>> From: users-boun...@open-mpi.org > > >>> To: Open MPI Users > > >>> Sent: Wed Mar 31 05:38:48 2010 > > >>> Subject: Re: [OMPI users] Hide Abort output > > >>> > > >>> > > >>> I have to say this is a very common issue for our users. They > > >>> repeatedly report the long Open MPI MPI_Abort() message in help > > >>> queries and fail to look for the application error message about the > > >>> root cause. A short MPI_Abort() message that said "look elsewhere > > >>> for the real error message" would be useful. > > >>> > > >>> Cheers, > > >>> David > > >>> > > >>> On 03/31/2010 07:58 PM, Yves Caniou wrote: > > Dear all, > > > > I am using the MPI_Abort() command in a MPI program. > > I would like to not see the note explaining that the command caused > > Open MPI to kill all the jobs and so on. > > I thought that I could find a --mca parameter, but couldn't grep it. > > The only ones deal with the delay and printing more information (the > > stack). > > > > Is there a mean to avoid the printing of the note (except the > > 2>/dev/null tips)? Or to delay this printing? > > > > Thank you. > > > > .Yves. > > >>> > > >>> ___ > > >>> users mailing list > > >>> us...@open-mpi.org > > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > >> > > >> -- > > >> Yves Caniou > > >> Associate Professor at Université Lyon 1, > > >> Member of the team project INRIA GRAAL in the LIP ENS-Lyon, > > >> Délégation CNRS in Japan French Laboratory of Informatics (JFLI), > > >> * in Information Technology Center, The University of Tokyo, > > >> 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan > > >> tel: +81-3-5841-0540 > > >> * in National Institute of Informatics > > >> 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan > > >> tel: +81-3-4212-2412 > > >> http://graal.ens-lyon.fr/~ycaniou/ > > >> > > >> ___ > > >> users mailing list > > >> us...@open-mpi.org > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _
[OMPI users] OpenMPI multithreaded performance
Dear OpenMPI team hiw much performances we should expect using MPI multithread capability (MPI_init_thread in multiple format). It seems that no performance exist using some simple test like multiple mpi channel activated, overlapping comm and computation and so on Thank's in advance Piero Piero LANUCARA Consorzio interuniversitario per le Applicazioni di Supercalcolo Per Università e Ricerca via dei Tizii n.6b 00185 Roma (Italy) phone: 06-44486709, fax: 06-4957083 cell: 3403006589 E-mail:piero.lanuc...@caspur.it
Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine
> If you run your cmd with the hostfile option and add > --display-allocation, what does it say? Thank you, Ralph. This is the command I used inside my submission script: mpirun --display-allocation -np 4 -hostfile hosts ./program And this is the output I got. Data for node: Name: node03 Num slots: 4Max slots: 0 Data for node: Name: node02 Num slots: 4Max slots: 0 Data for node: Name: node04 Num slots: 4Max slots: 0 Data for node: Name: node01 Num slots: 4Max slots: 0 If I run the same mpirun command on the cluster head node "clhead" then this is what I get: Data for node: Name: clhead Num slots: 0Max slots: 0 Data for node: Name: node01 Num slots: 1Max slots: 0 Data for node: Name: node02 Num slots: 1Max slots: 0 Data for node: Name: node03 Num slots: 1Max slots: 0 Data for node: Name: node04 Num slots: 1Max slots: 0 The content of the 'hosts' file: node01 slots=1 node02 slots=1 node03 slots=1 node04 slots=1 = Serge On Apr 6, 2010, at 12:18 PM, Serge wrote: Hi, OpenMPI integrates with Sun Grid Engine really well, and one does not need to specify any parameters for the mpirun command to launch the processes on the compute nodes, that is having in the submission script "mpirun ./program" is enough; there is no need for "-np XX" or "-hostfile file_name". However, there are cases when being able to specify the hostfile is important (hybrid jobs, users with MPICH jobs, etc.). For example, with Grid Engine I can request four 4-core nodes, that is total of 16 slots. But I also want to specify how to distribute processes on the nodes, so I create the file 'hosts' node01 slots=1 node02 slots=1 node03 slots=1 node04 slots=1 and modify the line in the submission script to: mpirun -hostfile hosts ./program With Open MPI 1.2.x everything worked properly, meaning that Open MPI could count the number of slots specified in the 'hosts' file - 4 (i.e. effectively supplying the mpirun command with the -np parameter), as well as properly distribute processes on the compute nodes (one process per host). It's different with Open MPI 1.4.1. It cannot process the 'hosts' file properly at all. All the processes get launched on just one node -- the shepherd host. The format of the 'hosts' file does not matter. It can be, say node01 node01 node02 node02 meaning 2 slots on each node. Open MPI 1.2.x would handle that with no problem, however Open MPI 1.4.x would not. The problem appears with OMPI 1.4.1, SGE 6.1u6. It was also tested with OMPI 1.3.4 and SGE 6.2u4. It's important to notice that if the mpirun command is run interactively, not from inside the Grid Engine script, then it interprets the content of the host file just fine. I am wondering what changed from OMPI 1.2.x to OMPI 1.4.x that prevents expected behavior, and is it possible to get it from OMPI 1.4.x by, say, tuning some parameters? = Serge
Re: [OMPI users] OpenMPI multithreaded performance
On 4/7/2010 1:20 AM, Piero Lanucara wrote: Dear OpenMPI team hiw much performances we should expect using MPI multithread capability (MPI_init_thread in multiple format). It seems that no performance exist using some simple test like multiple mpi channel activated, overlapping comm and computation and so on Maybe I don't understand your question. Are you saying that none of the references found by search terms such as "hybrid mpi openmp" are useful for you? They cover so many topics, you would have to be much more specific about which topics you want in more detail. -- Tim Prince
Re: [OMPI users] OpenMPI multithreaded performance
Thank's for your answer and sorry for misunderstanding. My question is about performance of this implementation regarding the use of multithreading capabilities. Of course MPI+OpenMP is a choice but (for what I understand) you should obtain a lot of performances also using MPI multithreading capabilities (for example in the communication part of your code) and without using OpenMP. So, the question is if this is true with the current implementation of OpenMPI Thank's again Piero Maybe I don't understand your question. Are you saying that none of the references found by search terms such as "hybrid mpi openmp" are useful for you? They cover so many topics, you would have to be much more specific about which topics you want in more detail. -- Tim Prince ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine
Serge writes: > However, there are cases when being able to specify the hostfile is > important (hybrid jobs, users with MPICH jobs, etc.). [I don't understand what MPICH has to do with it.] > For example, > with Grid Engine I can request four 4-core nodes, that is total of 16 > slots. But I also want to specify how to distribute processes on the > nodes, so I create the file 'hosts' > > node01 slots=1 > node02 slots=1 > node03 slots=1 > node04 slots=1 > > and modify the line in the submission script to: > mpirun -hostfile hosts ./program Regardless of any open-mpi bug, I'd have thought it was easier just to use -npernode in that case. What's the problem with that? It seems to me best generally to control the distribution of processes with mpirun on the SGE-allocated nodes than to concoct host files as we used to do here, e.g. to get -byslot v. -bynode behaviour (or vice versa).
Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine
>> However, there are cases when being able to specify the hostfile is >> important (hybrid jobs, users with MPICH jobs, etc.). >[I don't understand what MPICH has to do with it.] This was just an example of how the different behavior of OMPI 1.4 may cause problems. The MPICH library is not the subject of discussion. MPICH requires the use of hostfile, which is generated by SGE, and having it in the submission for an Open MPI 1.2.x job has an expected effect. This is different for Open MPI 1.4.x, which appears not interpreting the host file properly. >> For example, >> with Grid Engine I can request four 4-core nodes, that is total of 16 >> slots. But I also want to specify how to distribute processes on the >> nodes, so I create the file 'hosts' >> >> node01 slots=1 >> node02 slots=1 >> node03 slots=1 >> node04 slots=1 >> >> and modify the line in the submission script to: >> mpirun -hostfile hosts ./program > Regardless of any open-mpi bug, I'd have thought it was easier just to > use -npernode in that case. What's the problem with that? It seems to > me best generally to control the distribution of processes with mpirun > on the SGE-allocated nodes than to concoct host files as we used to do > here, e.g. to get -byslot v. -bynode behaviour (or vice versa). This is exactly what I am doing -- controlling distribution of processes with mpirun on the SGE-allocated nodes, by supplying the hostfile. Grid Engine allocates nodes and generates a hostfile, which I then can modify however I want to, before running the mpirun command. Moreover, it gives more control, by allowing to create specific SGE parallel environments, where the process distribution is predetermined -- one less worry for users playing with mpirun options. The example in my initial email was deliberately simplified to demonstrate the problem. = Serge
Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine
I should have read your original note more closely and I would have spotted the issue. How a hostfile is used changed between OMPI 1.2 and the 1.3 (and above) releases per user requests. It was actually the SGE side of the community that led the change :-) You can get a full description of how OMPI uses hostfiles in two ways: * from the man pages: man orte_hosts * from the wiki: https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan As far as I can tell, OMPI 1.4.x is behaving per that specification. You get four slots on your submission script because that is what SGE allocated to you. The hostfile filters that when launching, using the provided info to tell it how many slots on each node within the allocation to use for that application. I suggest reading the above documentation to see how OMPI uses hostfiles, and then let us know if you have any questions, concerns, or see a deviation from the described behavior. HTH Ralph On Apr 7, 2010, at 5:36 AM, Serge wrote: > > If you run your cmd with the hostfile option and add > > --display-allocation, what does it say? > > Thank you, Ralph. > > This is the command I used inside my submission script: > > mpirun --display-allocation -np 4 -hostfile hosts ./program > > And this is the output I got. > > Data for node: Name: node03 Num slots: 4Max slots: 0 > Data for node: Name: node02 Num slots: 4Max slots: 0 > Data for node: Name: node04 Num slots: 4Max slots: 0 > Data for node: Name: node01 Num slots: 4Max slots: 0 > > If I run the same mpirun command on the cluster head node "clhead" then this > is what I get: > > Data for node: Name: clhead Num slots: 0Max slots: 0 > Data for node: Name: node01 Num slots: 1Max slots: 0 > Data for node: Name: node02 Num slots: 1Max slots: 0 > Data for node: Name: node03 Num slots: 1Max slots: 0 > Data for node: Name: node04 Num slots: 1Max slots: 0 > > The content of the 'hosts' file: > > node01 slots=1 > node02 slots=1 > node03 slots=1 > node04 slots=1 > > = Serge > > > On Apr 6, 2010, at 12:18 PM, Serge wrote: > >> Hi, >> OpenMPI integrates with Sun Grid Engine really well, and one does not need >> to specify any parameters for the mpirun command to launch the processes on >> the compute nodes, that is having in the submission script "mpirun >> ./program" is enough; there is no need for "-np XX" or "-hostfile file_name". >> However, there are cases when being able to specify the hostfile is >> important (hybrid jobs, users with MPICH jobs, etc.). For example, with Grid >> Engine I can request four 4-core nodes, that is total of 16 slots. But I >> also want to specify how to distribute processes on the nodes, so I create >> the file 'hosts' >> node01 slots=1 >> node02 slots=1 >> node03 slots=1 >> node04 slots=1 >> and modify the line in the submission script to: >> mpirun -hostfile hosts ./program >> With Open MPI 1.2.x everything worked properly, meaning that Open MPI could >> count the number of slots specified in the 'hosts' file - 4 (i.e. >> effectively supplying the mpirun command with the -np parameter), as well as >> properly distribute processes on the compute nodes (one process per host). >> It's different with Open MPI 1.4.1. It cannot process the 'hosts' file >> properly at all. All the processes get launched on just one node -- the >> shepherd host. >> The format of the 'hosts' file does not matter. It can be, say >> node01 >> node01 >> node02 >> node02 >> meaning 2 slots on each node. Open MPI 1.2.x would handle that with no >> problem, however Open MPI 1.4.x would not. >> The problem appears with OMPI 1.4.1, SGE 6.1u6. It was also tested with OMPI >> 1.3.4 and SGE 6.2u4. >> It's important to notice that if the mpirun command is run interactively, >> not from inside the Grid Engine script, then it interprets the content of >> the host file just fine. >> I am wondering what changed from OMPI 1.2.x to OMPI 1.4.x that prevents >> expected behavior, and is it possible to get it from OMPI 1.4.x by, say, >> tuning some parameters? >> = Serge > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Best way to reduce 3D array
Thanks for the ideas. I did finally end up getting this working by sending back to the master process. It's quite ugly, and added a good bit of MPI to the code, but it works for now, and I will revisit this later. I am not sure what the file system is, I think it is XFS, but I don't know much about why this has an effect on the output - just the way files can be opened at once or something? I did have to end up using an MPI Data type, because this 3D domain was strided nicely in X, but not the other dimensions. The domain is larger in Z, so I wanted to order my loops such that Z is the innermost. This helped cut down some of the MPI overhead. It would have been nice to avoid this, but I could not think of the way to do it, and still have all of the computes working on the largest section of data possible. Derek -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ricardo Reis Sent: Monday, April 05, 2010 3:20 PM To: Open MPI Users Subject: Re: [OMPI users] Best way to reduce 3D array On Mon, 5 Apr 2010, Rob Latham wrote: > On Tue, Mar 30, 2010 at 11:51:39PM +0100, Ricardo Reis wrote: >> >> If using the master/slace IO model, would it be better to cicle >> through all the process and each one would write it's part of the >> array into the file. This file would be open in "stream" mode... >> >> like >> >> do p=0,nprocs-1 >> >>if(my_rank.eq.i)then >> >> openfile (append mode) >> write_to_file >> closefile >> >>endif >> >>call MPI_Barrier(world,ierr) >> >> enddo > > Note that there's no guarantee of the order here, though. Nothing > prevents rank 30 from hitting that loop before rank 2 does. To ensure don't they all have to hit the same Barrier? I think that will ensure order in this business... or am I being blind to something? I will agree, though, this is not the best solution to do it. I use this kind of arrangment when I'm desperate to do some prinf kind of debugging and want it ordered by process. Never had a problem with it. I mean, I assume there is some sort of sync before the do cycle starts. cheers! Ricardo Reis 'Non Serviam' PhD candidate @ Lasef Computational Fluid Dynamics, High Performance Computing, Turbulence http://www.lasef.ist.utl.pt Cultural Instigator @ Rádio Zero http://www.radiozero.pt Keep them Flying! Ajude a/help Aero Fénix! http://www.aeronauta.com/aero.fenix http://www.flickr.com/photos/rreis/ < sent with alpine 2.00 >
Re: [OMPI users] Problem in remote nodes
Jeff, In my case, it was the firewall. It was restricting communication to ssh only between the compute nodes. I appreciate the help. Rob Jeff Squyres (jsquyres) wrote: Those are normal ssh messages, I think - an ssh session may try mulktiple auth methods before one succeeds. You're absolutely sure that there's no firewalling software and selinux is disabled? Ompi is behaving as if it is trying to communicate and failing (e.g., its hanging while trying to open some tcp sockets back). Can you open random tcp sockets between your nodes? (E.g., in non-mpi processes) -jms Sent from my PDA. No type good. - Original Message - From: users-boun...@open-mpi.org To: Open MPI Users Sent: Wed Mar 31 06:25:43 2010 Subject: Re: [OMPI users] Problem in remote nodes I've been checking the /var/log/messages on the compute node and there is nothing new after executing ' mpirun --host itanium2 -np 2 helloworld.out', but in the /var/log/messages file on the remote node it appears the following messages, nothing about unix_chkpwd. Mar 31 11:56:51 itanium2 sshd(pam_unix)[15349]: authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=itanium1 user=otro Mar 31 11:56:53 itanium2 sshd[15349]: Accepted publickey for otro from 192.168.3.1 port 40999 ssh2 Mar 31 11:56:53 itanium2 sshd(pam_unix)[15351]: session opened for user otro by (uid=500) Mar 31 11:56:53 itanium2 sshd(pam_unix)[15351]: session closed for user otro It seems that the authentication fails at first, but in the next message it connects with the node... El Mar, 30 de Marzo de 2010, 20:02, Robert Collyer escribió: > I've been having similar problems using Fedora core 9. I believe the > issue may be with SELinux, but this is just an educated guess. In my > setup, shortly after a login via mpi, there is a notation in the > /var/log/messages on the compute node as follows: > > Mar 30 12:39:45 kernel: type=1400 audit(1269970785.534:588): > avc: denied { read } for pid=8047 comm="unix_chkpwd" name="hosts" > dev=dm-0 ino=24579 > scontext=system_u:system_r:system_chkpwd_t:s0-s0:c0.c1023 > tcontext=unconfined_u:object_r:etc_runtime_t:s0 tclass=file > > which says SELinux denied unix_chkpwd read access to hosts. > > Are you getting anything like this? > > In the meantime, I'll check if allowing unix_chkpwd read access to hosts > eliminates the problem on my system, and if it works, I'll post the > steps involved. > > uriz.49...@e.unavarra.es wrote: >> I've benn investigating and there is no firewall that could stop TCP >> traffic in the cluster. With the option --mca plm_base_verbose 30 I get >> the following output: >> >> [itanium1] /home/otro > mpirun --mca plm_base_verbose 30 --host itanium2 >> helloworld.out >> [itanium1:08311] mca: base: components_open: Looking for plm components >> [itanium1:08311] mca: base: components_open: opening plm components >> [itanium1:08311] mca: base: components_open: found loaded component rsh >> [itanium1:08311] mca: base: components_open: component rsh has no >> register >> function >> [itanium1:08311] mca: base: components_open: component rsh open function >> successful >> [itanium1:08311] mca: base: components_open: found loaded component >> slurm >> [itanium1:08311] mca: base: components_open: component slurm has no >> register function >> [itanium1:08311] mca: base: components_open: component slurm open >> function >> successful >> [itanium1:08311] mca:base:select: Auto-selecting plm components >> [itanium1:08311] mca:base:select:( plm) Querying component [rsh] >> [itanium1:08311] mca:base:select:( plm) Query of component [rsh] set >> priority to 10 >> [itanium1:08311] mca:base:select:( plm) Querying component [slurm] >> [itanium1:08311] mca:base:select:( plm) Skipping component [slurm]. >> Query >> failed to return a module >> [itanium1:08311] mca:base:select:( plm) Selected component [rsh] >> [itanium1:08311] mca: base: close: component slurm closed >> [itanium1:08311] mca: base: close: unloading component slurm >> >> --Hangs here >> >> It seems a slurm problem?? >> >> Thanks to any idea >> >> El Vie, 19 de Marzo de 2010, 17:57, Ralph Castain escribió: >> >>> Did you configure OMPI with --enable-debug? You should do this so that >>> more diagnostic output is available. >>> >>> You can also add the following to your cmd line to get more info: >>> >>> --debug --debug-daemons --leave-session-attached >>> >>> Something is likely blocking proper launch of the daemons and processes >>> so >>> you aren't getting to the btl's at all. >>> >>> >>> On Mar 19, 2010, at 9:42 AM, uriz.49...@e.unavarra.es wrote: >>> >>> The processes are running on the remote nodes but they don't give the response to the origin node. I don't know why. With the option --mca btl_base_verbose 30, I have the same problems and it doesn't show any message. Thanks > On Wed, Mar 17, 2010 at 1:41 PM, Jeff Squyres > wrote: > >> On Mar 17
Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine
Thank you, Ralph. I have read the wiki and the man pages. But I am still not sure I understand what is going on in my example. I cannot filter the slots allocated by SGE. I also think that there is a deviation from the behavior described on the wiki (precisely example 5 from the top in section "NOW RUNNING FROM THE INTERACTIVE SHELL"). So, below, I am copy-pasting my session, and I am asking if you could please follow my line of thought and correct me where I am mistaken? Here I request an interactive session with 16 slots on 4 four-core nodes like so: $ qrsh -cwd -V -pe ompi* 16 -l h_rt=10:00:00,h_vmem=2G bash Now, I show that all 16 slots are available and everything is working as expected with both OMPI 1.2.9 and OMPI 1.4.1: graphics01 $ ~/openmpi/gnu141/bin/mpirun hostname [graphics01:24837] ras:gridengine: JOB_ID: 89052 graphics01 graphics01 graphics01 graphics01 graphics04 graphics04 graphics02 graphics02 graphics04 graphics02 graphics04 graphics02 graphics03 graphics03 graphics03 graphics03 graphics01 $ ~/openmpi/gnu129/bin/mpirun hostname [graphics01:24849] ras:gridengine: JOB_ID: 89052 graphics01 graphics04 graphics02 graphics03 graphics01 graphics04 graphics02 graphics03 graphics01 graphics03 graphics01 graphics04 graphics03 graphics04 graphics02 graphics02 Now, I want to filter the list of 16 slots by using the host file. I want to run 1 process per node. graphics01 $ cat hosts graphics01 slots=1 graphics02 slots=1 graphics03 slots=1 graphics04 slots=1 And I try to use it with OMPI 1.2.9 and 1.4.1 graphics01 $ ~/openmpi/gnu129/bin/mpirun -hostfile hosts hostname graphics04 graphics01 graphics03 graphics02 graphics01 $ ~/openmpi/gnu141/bin/mpirun -hostfile hosts hostname [graphics01:24903] ras:gridengine: JOB_ID: 89052 graphics01 So, as you can see OMPI1.4.1 did not recognize any hosts except the current shepherd host. Moreover, similarly to the example down below on https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan, I create two other host files: graphics01 $ cat hosts1 graphics02 graphics02 graphics01 $ cat hosts2 graphics02 slots=2 And then try to use them with both versions of Open MPI: It works properly with OMPI 1.2.9 (the same way as showed on the wiki!), but does NOT with 1.4.1 graphics01 $ ~/openmpi/gnu129/bin/mpirun -hostfile hosts1 hostname graphics02 graphics02 graphics01 $ ~/openmpi/gnu129/bin/mpirun -hostfile hosts2 hostname graphics02 graphics02 graphics01 $ ~/openmpi/gnu141/bin/mpirun -hostfile hosts1 hostname [graphics01:25756] ras:gridengine: JOB_ID: 89055 -- There are no allocated resources for the application hostname that match the requested mapping: hosts1 Verify that you have mapped the allocated resources properly using the --host or --hostfile specification. -- -- A daemon (pid unknown) died unexpectedly on signal 1 while attempting to launch so we are aborting. = Serge Ralph Castain wrote: I should have read your original note more closely and I would have spotted the issue. How a hostfile is used changed between OMPI 1.2 and the 1.3 (and above) releases per user requests. It was actually the SGE side of the community that led the change :-) You can get a full description of how OMPI uses hostfiles in two ways: * from the man pages: man orte_hosts * from the wiki: https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan As far as I can tell, OMPI 1.4.x is behaving per that specification. You get four slots on your submission script because that is what SGE allocated to you. The hostfile filters that when launching, using the provided info to tell it how many slots on each node within the allocation to use for that application. I suggest reading the above documentation to see how OMPI uses hostfiles, and then let us know if you have any questions, concerns, or see a deviation from the described behavior. HTH Ralph On Apr 7, 2010, at 5:36 AM, Serge wrote: If you run your cmd with the hostfile option and add --display-allocation, what does it say? Thank you, Ralph. This is the command I used inside my submission script: mpirun --display-allocation -np 4 -hostfile hosts ./program And this is the output I got. Data for node: Name: node03 Num slots: 4Max slots: 0 Data for node: Name: node02 Num slots: 4Max slots: 0 Data for node: Name: node04 Num slots: 4Max slots: 0 Data for node: Name: node01 Num slots: 4Max slots: 0 If I run the same mpirun command on the cluster head node "clhead" then this is what I get: Data for node: Name: clhead Num slots: 0Ma
Re: [OMPI users] Best way to reduce 3D array
Hi Derek Cole, Derek E wrote: Thanks for the ideas. I did finally end up getting this working by sending back to the master process. It's quite ugly, and added a good bit of MPI to the code, but it works for now, and I will revisit this later. Is the MPI code uglier than the OO-stuff you mentioned before? :) That you parallelized the code is an accomplishment anyway. Maybe "It works" is the first level of astonishment and reward one can get from programming, particularly in MPI! :) Unfortunately, "It is efficient", "It is portable", "It is easy to change and maintain", etc, seem to come later, at least in real world conditions. (OK, throw me eggs and tomatoes ...) However, your quick description suggests that you cared about the other items too, using MPI types to make the code more elegant and efficient, for instance. In principle I agree with another posting (I can't find it now) that advocated careful code design, from scratch, with a parallel algorithm in mind, and, whenever possible, taking advantage of quality libraries built on top of MPI (e.g. PETSc). However, most of the time we are patching and refurbishing existing code, particularly when it comes parallelization (with MPI, OpenMP or other). At least this is the reality I see in our area here (Earth Sciences). I would guess in other areas of engineering it is the same. Most of the time architects are dealing with building maintenance, then sometimes with building reform, but only rarely they work on the design of a new building, or not? I am not sure what the file system is, I think it is XFS, but I don't know much about why this has an effect on the output - just the way files can be opened at once or something? I meant parallel (PVFS, etc) versus serial (ext3, xfs, etc) file systems. I guess you have XFS one one machine, mounted over NFS across the cluster. If you send too many read and write requests you may overwhelm NFS, at least this is my experience. By contrast, MPI scales much better with the number of processes that exchange messages. Hence, better funnel the data flow through MPI instead, and let NFS talk to a single process (or to a single process at a time). For this type of situation the old scheme: "master reads and data is scattered; data is gathered and master writes", works fine, regardless of whether you may think your code looks ugly or not. Ricardo Reis suggested another solution, using a loop and MPI_Barrier to serialize the writes from all processes, and avoid file contention on NFS. Another way would be to use MPI-IO. I did have to end up using an MPI Data type, because this 3D domain was strided nicely in X, but not the other dimensions. The domain is larger in Z, so I wanted to order my loops such that Z is the innermost. This helped cut down some of the MPI overhead. It would have been nice to avoid this, but I could not think of the way to do it, and still have all of the computes working on the largest = section of data possible. > Derek > I agree. The underlying algorithm to some extent dictates how MPI should be used, and how the data is laid out and distributed. In the best of the worlds you could devise and develop an algorithm that is both computationally and MPI (i.e. communication-wise) efficient, and simple, and clean, etc. More often then not one doesn't have the time or support to do this, right? The end user seldom cares about it either. At least this has been my experience here. Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ricardo Reis Sent: Monday, April 05, 2010 3:20 PM To: Open MPI Users Subject: Re: [OMPI users] Best way to reduce 3D array On Mon, 5 Apr 2010, Rob Latham wrote: On Tue, Mar 30, 2010 at 11:51:39PM +0100, Ricardo Reis wrote: If using the master/slace IO model, would it be better to cicle through all the process and each one would write it's part of the array into the file. This file would be open in "stream" mode... like do p=0,nprocs-1 if(my_rank.eq.i)then openfile (append mode) write_to_file closefile endif call MPI_Barrier(world,ierr) enddo Note that there's no guarantee of the order here, though. Nothing prevents rank 30 from hitting that loop before rank 2 does. To ensure don't they all have to hit the same Barrier? I think that will ensure order in this business... or am I being blind to something? I will agree, though, this is not the best solution to do it. I use this kind of arrangment when I'm desperate to do some prinf kind of debugging and want it ordered by process. Never had a problem with it. I mean, I assume there is some sort of sync before the do cycle starts.