Dear Eric, Aurelien and Eugene thanks a lot for helping. What Eugene said summarizes exactly the situation. I agree it's an issue with the full code, since the problem doesn't arise in simple examples, like the one I posted. I was just hoping I was doing something trivially wrong and that someone would shout at me :-). I could post the full code, but it's quite a long one. At the moment I am still going through it searching for the problem, so I'll wait a bit before spamming the other users.
cheers Enrico On Mon, Sep 15, 2008 at 6:00 PM, <users-requ...@open-mpi.org> wrote: > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. Re: Problem using VampirTrace (Thomas Ropars) > 2. Re: Why compilig in global paths (only) for configuretion > files? (Paul Kapinos) > 3. Re: MPI_sendrecv = MPI_Send+ MPI_RECV ? (Eugene Loh) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 15 Sep 2008 15:04:07 +0200 > From: Thomas Ropars <trop...@irisa.fr> > Subject: Re: [OMPI users] Problem using VampirTrace > To: Andreas Kn?pfer <andreas.knuep...@tu-dresden.de> > Cc: us...@open-mpi.org > Message-ID: <48ce5d47.50...@irisa.fr> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hello, > > I don't have a common file system for all cluster nodes. > > I've tried to run the application again with VT_UNIFY=no and to call > vtunify manually. It works well. I managed to get the .otf file. > > Thank you. > > Thomas Ropars > > > Andreas Kn?pfer wrote: >> Hello Thomas, >> >> sorry for the delay. My first asumption about the cause of your problem is >> the >> so called "unify" process. This is a post-processing step which is performed >> automatically after the trace run. This step needs read access to all files, >> though. So, do you have a common file system for all cluster nodes? >> >> If yes, set the env variable VT_PFORM_GDIR point there. Then the traces will >> be copied there from the location VT_PFORM_LDIR which still can be a >> node-local directory. Then everything will be handled automatically. >> >> If not, please set VT_UNIFY=no in order to disable automatic unification. >> Then >> you need to call vtunify manually. Please copy all files from the run >> directory that start with your OTF file prefix to a common directory and call >> >> %> vtunify <number of processes> <file prefix> >> >> there. This should give you the <prefix>.otf file. >> >> Please give this a try. If it is not working, please give me an 'ls -alh' >> from >> your trace directory/directories. >> >> Best regards, Andreas >> >> >> P.S.: Please have my email on CC, I'm not on the us...@open-mpi.org list. >> >> >> >> >>>> From: Thomas Ropars <trop...@irisa.fr> >>>> Date: August 11, 2008 3:47:54 PM IST >>>> To: us...@open-mpi.org >>>> Subject: [OMPI users] Problem using VampirTrace >>>> Reply-To: Open MPI Users <us...@open-mpi.org> >>>> >>>> Hi all, >>>> >>>> I'm trying to use VampirTrace. >>>> I'm working with r19234 of svn trunk. >>>> >>>> When I try to run a simple application with 4 processes on the same >>>> computer, it works well. >>>> But if try to use the same application with the 4 processes executed >>>> on 4 different computers, I never get the .otf file. >>>> >>>> I've tried to run with VT_VERBOSE=yes, and I get the following trace: >>>> >>>> VampirTrace: Thread object #0 created, total number is 1 >>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe8349ca.3294 id 1] for generation [buffer 32000000 bytes] >>>> VampirTrace: Thread object #0 created, total number is 1 >>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe834bca.3020 id 1] for generation [buffer 32000000 bytes] >>>> VampirTrace: Thread object #0 created, total number is 1 >>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe834aca.3040 id 1] for generation [buffer 32000000 bytes] >>>> VampirTrace: Thread object #0 created, total number is 1 >>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe834fca.3011 id 1] for generation [buffer 32000000 bytes] >>>> Ring : Start >>>> Ring : End >>>> [1]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe834aca.3040 id 1] >>>> [2]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe834bca.3020 id 1] >>>> [1]VampirTrace: Closed OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe834aca.3040 id 1] >>>> [3]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe834fca.3011 id 1] >>>> [2]VampirTrace: Closed OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe834bca.3020 id 1] >>>> [0]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe8349ca.3294 id 1] >>>> [1]VampirTrace: Wrote unify control file ./ring-vt.2.uctl >>>> [2]VampirTrace: Wrote unify control file ./ring-vt.3.uctl >>>> [3]VampirTrace: Closed OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe834fca.3011 id 1] >>>> [0]VampirTrace: Closed OTF writer stream [namestub /tmp/ring- >>>> vt.fffffffffe8349ca.3294 id 1] >>>> [0]VampirTrace: Wrote unify control file ./ring-vt.1.uctl >>>> [0]VampirTrace: Checking for ./ring-vt.1.uctl ... >>>> [0]VampirTrace: Checking for ./ring-vt.2.uctl ... >>>> [1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca. >>>> 3040.1.def >>>> [2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca. >>>> 3020.1.def >>>> [3]VampirTrace: Wrote unify control file ./ring-vt.4.uctl >>>> [1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca. >>>> 3040.1.events >>>> [2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca. >>>> 3020.1.events >>>> [3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca. >>>> 3011.1.def >>>> [1]VampirTrace: Thread object #0 deleted, leaving 0 >>>> [2]VampirTrace: Thread object #0 deleted, leaving 0 >>>> [3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca. >>>> 3011.1.events >>>> [3]VampirTrace: Thread object #0 deleted, leaving 0 >>>> >>>> >>>> Regards >>>> >>>> Thomas >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >> >> >> >> > > > > ------------------------------ > > Message: 2 > Date: Mon, 15 Sep 2008 17:22:03 +0200 > From: Paul Kapinos <kapi...@rz.rwth-aachen.de> > Subject: Re: [OMPI users] Why compilig in global paths (only) for > configuretion files? > To: Open MPI Users <us...@open-mpi.org>, Samuel Sarholz > <sarh...@rz.rwth-aachen.de> > Message-ID: <48ce7d9b.8070...@rz.rwth-aachen.de> > Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" > > Hi Jeff, hi all! > > Jeff Squyres wrote: >> Short answer: yes, we do compile in the prefix path into OMPI. Check >> out this FAQ entry; I think it'll solve your problem: >> >> http://www.open-mpi.org/faq/?category=building#installdirs > > > Yes, reading man pages helps! > Thank you to provide useful help. > > But the setting of the environtemt variable OPAL_PREFIX to an > appropriate value (assuming PATH and LD_LIBRARY_PATH are setted too) is > not enough to let the OpenMPI rock&roll from the new lokation. > > Because of the fact, that all the files containing settings for > opal_wrapper, which are located in share/openmpi/ and called e.g. > mpif77-wrapper-data.txt, contain (defined by installation with --prefix) > hard-coded paths, too. > > I have fixed the problem by parsing all the files share/openmpi/*.txt > and replacing the old path through new path. This nasty solution seems > to work. > > But, is there an elegant way to do this correctness, maybe to > re-generate the config-files in share/openmpi/ > > And last but not least, the FAQ on the web site you provided (see link > above) does not containn any info on the need to modufy the wrapper > configuretion files. Maybe this section schould be upgraded? > > Best regards Paul Kapinos > > > > > > > > > >> >> >> On Sep 8, 2008, at 5:33 AM, Paul Kapinos wrote: >> >>> Hi all! >>> >>> We are using OpenMPI on an variety of machines (running Linux, >>> Solaris/Sparc and /Opteron) using couple of compilers (GCC, Sun >>> Studio, Intel, PGI, 32 and 64 bit...) so we have at least 15 versions >>> of each release of OpenMPI (SUN Cluster Tools not included). >>> >>> This shows, that we have to support an complete petting zoo of >>> OpenMPI's. Sometimes we may need to move things around. >>> >>> >>> If OpenMPI is being configured, the install path may be provided using >>> --prefix keyword, say so: >>> >>> ./configure --prefix=/my/love/path/for/openmpi/tmp1 >>> >>> After "gmake all install" in ...tmp1 an installation of OpenMPI may be >>> found. >>> >>> Then, say, we need to *move* this Version to an another path, say >>> /my/love/path/for/openmpi/blupp >>> >>> Of course we have to set $PATH and $LD_LIBRARY_PATH accordingly (we >>> can that ;-) >>> >>> And if we tried to use OpenMPI from new location, we got error message >>> like >>> >>> $ ./mpicc >>> Cannot open configuration file >>> /my/love/path/for/openmpi/tmp1/share/openmpi/mpicc-wrapper-data.txt >>> Error parsing data file mpicc: Not found >>> >>> (note the old installation path used) >>> >>> That looks for me, that the install path provided with --prefix in >>> configuration step, is compiled into opal_wrapper executable file and >>> opal_wrapper works iff the set of configuration files is in this path. >>> But after move of the OpenMP installation directory the configuration >>> files aren't there... >>> >>> An side effect of this behaviour is the certainty that binary >>> distributions of OpenMPI (RPM's) are not relocatable. That's >>> uncomfortably. (Actually, this mail is initiated by the fact that Sun >>> ClusterTools RPM's are not relocatable) >>> >>> >>> So, does this behavior have an deeper sence I cannot recognise, or >>> maybe the configuring of global paths is not needed? >>> >>> What I mean, is that the paths for the configuration files, which >>> opal_wrapper need, may be setted locally like ../share/openmpi/*** >>> without affectiong the integrity of OpenMPI. Maybe there were were >>> more places where the usage of local paths may be needed to allowe >>> movable (relocable) OpenMPI. >>> >>> What do you mean about? >>> >>> Best regards >>> Paul Kapinos >>> >>> >>> >>> <kapinos.vcf>_______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: verwurschel_pfade_openmpi.sh > Type: application/x-sh > Size: 369 bytes > Desc: not available > URL: > <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.sh> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: kapinos.vcf > Type: text/x-vcard > Size: 330 bytes > Desc: not available > URL: > <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.vcf> > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: smime.p7s > Type: application/x-pkcs7-signature > Size: 4230 bytes > Desc: S/MIME Cryptographic Signature > URL: > <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.bin> > > ------------------------------ > > Message: 3 > Date: Mon, 15 Sep 2008 08:46:11 -0700 > From: Eugene Loh <eugene....@sun.com> > Subject: Re: [OMPI users] MPI_sendrecv = MPI_Send+ MPI_RECV ? > To: Open MPI Users <us...@open-mpi.org> > Message-ID: <48ce8343.7060...@sun.com> > Content-Type: text/plain; format=flowed; charset=ISO-8859-1 > > Aur?lien Bouteiller wrote: > >> You can't assume that MPI_Send does buffering. > > Yes, but I think this is what Eric meant by misinterpreting Enrico's > problem. The communication pattern is to send a message, which is > received remotely. There is remote computation, and then data is sent > back. No buffering is needed for such a pattern. The code is > "apparently" legal. There is apparently something else going on in the > "real" code that is not captured in the example Enrico sent. > > Further, if I understand correctly, the remote process actually receives > the data! If this is true, the example is as simple as: > > process 1: > MPI_Send() // this call blocks > > process 0: > MPI_Recv() // this call actually receives the data sent by > MPI_Send!!! > > Enrico originally explained that process 0 actually receives the data. > So, MPI's internal buffering is presumably not a problem at all! An > MPI_Send effectively sends data to a remote process, but simply never > returns control to the user program. > >> Without buffering, you are in a possible deadlock situation. This >> pathological case is the exact motivation for the existence of >> MPI_Sendrecv. You can also consider Isend Recv Wait, then the Send >> will never block, even if the destination is not ready to receive, or >> MPI_Bsend that will add explicit buffering and therefore return >> control to you before the message transmission actually begun. >> >> Aurelien >> >> >> Le 15 sept. 08 ? 01:08, Eric Thibodeau a ?crit : >> >>> Sorry about that, I had misinterpreted your original post as being >>> the pair of send-receive. The example you give below does seem >>> correct indeed, which means you might have to show us the code that >>> doesn't work. Note that I am in no way a Fortran expert, I'm more >>> versed in C. The only hint I'd give a C programmer in this case is >>> "make sure your receiving structures are indeed large enough (ie: >>> you send 3d but eventually receive 4d...did you allocate for 3d or >>> 4d for receiving the converted array...). >>> >>> Eric >>> >>> Enrico Barausse wrote: >>> >>>> sorry, I hadn't changed the subject. I'm reposting: >>>> >>>> Hi >>>> >>>> I think it's correct. what I want to to is to send a 3d array from the >>>> process 1 to process 0 =root): >>>> call MPI_Send(toroot,3,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD >>>> >>>> in some other part of the code process 0 acts on the 3d array and >>>> turns it into a 4d one and sends it back to process 1, which receives >>>> it with >>>> >>>> call MPI_RECV(tonode, >>>> 4,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD,status,ierr) >>>> >>>> in practice, what I do i basically give by this simple code (which >>>> doesn't give the segmentation fault unfortunately): >>>> >>>> >>>> >>>> a=(/1,2,3,4,5/) >>>> >>>> call MPI_INIT(ierr) >>>> call MPI_COMM_RANK(MPI_COMM_WORLD, id, ierr) >>>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr) >>>> >>>> if(numprocs/=2) stop >>>> >>>> if(id==0) then >>>> do k=1,5 >>>> a=a+1 >>>> call MPI_SEND(a,5,MPI_INTEGER, >>>> 1,k,MPI_COMM_WORLD,ierr) >>>> call >>>> MPI_RECV(b,4,MPI_INTEGER,1,k,MPI_COMM_WORLD,status,ierr) >>>> end do >>>> else >>>> do k=1,5 >>>> call >>>> MPI_RECV(a,5,MPI_INTEGER,0,k,MPI_COMM_WORLD,status,ierr) >>>> b=a(1:4) >>>> call MPI_SEND(b,4,MPI_INTEGER, >>>> 0,k,MPI_COMM_WORLD,ierr) >>>> end do >>>> end if >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> * Dr. Aur?lien Bouteiller >> * Sr. Research Associate at Innovative Computing Laboratory >> * University of Tennessee >> * 1122 Volunteer Boulevard, suite 350 >> * Knoxville, TN 37996 >> * 865 974 6321 >> >> >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/user >> s > > > > > > > ------------------------------ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > End of users Digest, Vol 1006, Issue 2 > ************************************** >