sorry, I should pay more attention when I edit the subject of the daily digest
Dear Eric, Aurelien and Eugene thanks a lot for helping. What Eugene said summarizes exactly the situation. I agree it's an issue with the full code, since the problem doesn't arise in simple examples, like the one I posted. I was just hoping I was doing something trivially wrong and that someone would shout at me :-). I could post the full code, but it's quite a long one. At the moment I am still going through it searching for the problem, so I'll wait a bit before spamming the other users. cheers Enrico > > On Mon, Sep 15, 2008 at 6:00 PM, <users-requ...@open-mpi.org> wrote: >> Send users mailing list submissions to >> us...@open-mpi.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> or, via email, send a message with subject or body 'help' to >> users-requ...@open-mpi.org >> >> You can reach the person managing the list at >> users-ow...@open-mpi.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of users digest..." >> >> >> Today's Topics: >> >> 1. Re: Problem using VampirTrace (Thomas Ropars) >> 2. Re: Why compilig in global paths (only) for configuretion >> files? (Paul Kapinos) >> 3. Re: MPI_sendrecv = MPI_Send+ MPI_RECV ? (Eugene Loh) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 15 Sep 2008 15:04:07 +0200 >> From: Thomas Ropars <trop...@irisa.fr> >> Subject: Re: [OMPI users] Problem using VampirTrace >> To: Andreas Kn?pfer <andreas.knuep...@tu-dresden.de> >> Cc: us...@open-mpi.org >> Message-ID: <48ce5d47.50...@irisa.fr> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Hello, >> >> I don't have a common file system for all cluster nodes. >> >> I've tried to run the application again with VT_UNIFY=no and to call >> vtunify manually. It works well. I managed to get the .otf file. >> >> Thank you. >> >> Thomas Ropars >> >> >> Andreas Kn?pfer wrote: >>> Hello Thomas, >>> >>> sorry for the delay. My first asumption about the cause of your problem is >>> the >>> so called "unify" process. This is a post-processing step which is performed >>> automatically after the trace run. This step needs read access to all files, >>> though. So, do you have a common file system for all cluster nodes? >>> >>> If yes, set the env variable VT_PFORM_GDIR point there. Then the traces will >>> be copied there from the location VT_PFORM_LDIR which still can be a >>> node-local directory. Then everything will be handled automatically. >>> >>> If not, please set VT_UNIFY=no in order to disable automatic unification. >>> Then >>> you need to call vtunify manually. Please copy all files from the run >>> directory that start with your OTF file prefix to a common directory and >>> call >>> >>> %> vtunify <number of processes> <file prefix> >>> >>> there. This should give you the <prefix>.otf file. >>> >>> Please give this a try. If it is not working, please give me an 'ls -alh' >>> from >>> your trace directory/directories. >>> >>> Best regards, Andreas >>> >>> >>> P.S.: Please have my email on CC, I'm not on the us...@open-mpi.org list. >>> >>> >>> >>> >>>>> From: Thomas Ropars <trop...@irisa.fr> >>>>> Date: August 11, 2008 3:47:54 PM IST >>>>> To: us...@open-mpi.org >>>>> Subject: [OMPI users] Problem using VampirTrace >>>>> Reply-To: Open MPI Users <us...@open-mpi.org> >>>>> >>>>> Hi all, >>>>> >>>>> I'm trying to use VampirTrace. >>>>> I'm working with r19234 of svn trunk. >>>>> >>>>> When I try to run a simple application with 4 processes on the same >>>>> computer, it works well. >>>>> But if try to use the same application with the 4 processes executed >>>>> on 4 different computers, I never get the .otf file. >>>>> >>>>> I've tried to run with VT_VERBOSE=yes, and I get the following trace: >>>>> >>>>> VampirTrace: Thread object #0 created, total number is 1 >>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe8349ca.3294 id 1] for generation [buffer 32000000 bytes] >>>>> VampirTrace: Thread object #0 created, total number is 1 >>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe834bca.3020 id 1] for generation [buffer 32000000 bytes] >>>>> VampirTrace: Thread object #0 created, total number is 1 >>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe834aca.3040 id 1] for generation [buffer 32000000 bytes] >>>>> VampirTrace: Thread object #0 created, total number is 1 >>>>> VampirTrace: Opened OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe834fca.3011 id 1] for generation [buffer 32000000 bytes] >>>>> Ring : Start >>>>> Ring : End >>>>> [1]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe834aca.3040 id 1] >>>>> [2]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe834bca.3020 id 1] >>>>> [1]VampirTrace: Closed OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe834aca.3040 id 1] >>>>> [3]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe834fca.3011 id 1] >>>>> [2]VampirTrace: Closed OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe834bca.3020 id 1] >>>>> [0]VampirTrace: Flushed OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe8349ca.3294 id 1] >>>>> [1]VampirTrace: Wrote unify control file ./ring-vt.2.uctl >>>>> [2]VampirTrace: Wrote unify control file ./ring-vt.3.uctl >>>>> [3]VampirTrace: Closed OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe834fca.3011 id 1] >>>>> [0]VampirTrace: Closed OTF writer stream [namestub /tmp/ring- >>>>> vt.fffffffffe8349ca.3294 id 1] >>>>> [0]VampirTrace: Wrote unify control file ./ring-vt.1.uctl >>>>> [0]VampirTrace: Checking for ./ring-vt.1.uctl ... >>>>> [0]VampirTrace: Checking for ./ring-vt.2.uctl ... >>>>> [1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca. >>>>> 3040.1.def >>>>> [2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca. >>>>> 3020.1.def >>>>> [3]VampirTrace: Wrote unify control file ./ring-vt.4.uctl >>>>> [1]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834aca. >>>>> 3040.1.events >>>>> [2]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834bca. >>>>> 3020.1.events >>>>> [3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca. >>>>> 3011.1.def >>>>> [1]VampirTrace: Thread object #0 deleted, leaving 0 >>>>> [2]VampirTrace: Thread object #0 deleted, leaving 0 >>>>> [3]VampirTrace: Removed trace file /tmp/ring-vt.fffffffffe834fca. >>>>> 3011.1.events >>>>> [3]VampirTrace: Thread object #0 deleted, leaving 0 >>>>> >>>>> >>>>> Regards >>>>> >>>>> Thomas >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>> >>> >>> >>> >> >> >> >> ------------------------------ >> >> Message: 2 >> Date: Mon, 15 Sep 2008 17:22:03 +0200 >> From: Paul Kapinos <kapi...@rz.rwth-aachen.de> >> Subject: Re: [OMPI users] Why compilig in global paths (only) for >> configuretion files? >> To: Open MPI Users <us...@open-mpi.org>, Samuel Sarholz >> <sarh...@rz.rwth-aachen.de> >> Message-ID: <48ce7d9b.8070...@rz.rwth-aachen.de> >> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" >> >> Hi Jeff, hi all! >> >> Jeff Squyres wrote: >>> Short answer: yes, we do compile in the prefix path into OMPI. Check >>> out this FAQ entry; I think it'll solve your problem: >>> >>> http://www.open-mpi.org/faq/?category=building#installdirs >> >> >> Yes, reading man pages helps! >> Thank you to provide useful help. >> >> But the setting of the environtemt variable OPAL_PREFIX to an >> appropriate value (assuming PATH and LD_LIBRARY_PATH are setted too) is >> not enough to let the OpenMPI rock&roll from the new lokation. >> >> Because of the fact, that all the files containing settings for >> opal_wrapper, which are located in share/openmpi/ and called e.g. >> mpif77-wrapper-data.txt, contain (defined by installation with --prefix) >> hard-coded paths, too. >> >> I have fixed the problem by parsing all the files share/openmpi/*.txt >> and replacing the old path through new path. This nasty solution seems >> to work. >> >> But, is there an elegant way to do this correctness, maybe to >> re-generate the config-files in share/openmpi/ >> >> And last but not least, the FAQ on the web site you provided (see link >> above) does not containn any info on the need to modufy the wrapper >> configuretion files. Maybe this section schould be upgraded? >> >> Best regards Paul Kapinos >> >> >> >> >> >> >> >> >> >>> >>> >>> On Sep 8, 2008, at 5:33 AM, Paul Kapinos wrote: >>> >>>> Hi all! >>>> >>>> We are using OpenMPI on an variety of machines (running Linux, >>>> Solaris/Sparc and /Opteron) using couple of compilers (GCC, Sun >>>> Studio, Intel, PGI, 32 and 64 bit...) so we have at least 15 versions >>>> of each release of OpenMPI (SUN Cluster Tools not included). >>>> >>>> This shows, that we have to support an complete petting zoo of >>>> OpenMPI's. Sometimes we may need to move things around. >>>> >>>> >>>> If OpenMPI is being configured, the install path may be provided using >>>> --prefix keyword, say so: >>>> >>>> ./configure --prefix=/my/love/path/for/openmpi/tmp1 >>>> >>>> After "gmake all install" in ...tmp1 an installation of OpenMPI may be >>>> found. >>>> >>>> Then, say, we need to *move* this Version to an another path, say >>>> /my/love/path/for/openmpi/blupp >>>> >>>> Of course we have to set $PATH and $LD_LIBRARY_PATH accordingly (we >>>> can that ;-) >>>> >>>> And if we tried to use OpenMPI from new location, we got error message >>>> like >>>> >>>> $ ./mpicc >>>> Cannot open configuration file >>>> /my/love/path/for/openmpi/tmp1/share/openmpi/mpicc-wrapper-data.txt >>>> Error parsing data file mpicc: Not found >>>> >>>> (note the old installation path used) >>>> >>>> That looks for me, that the install path provided with --prefix in >>>> configuration step, is compiled into opal_wrapper executable file and >>>> opal_wrapper works iff the set of configuration files is in this path. >>>> But after move of the OpenMP installation directory the configuration >>>> files aren't there... >>>> >>>> An side effect of this behaviour is the certainty that binary >>>> distributions of OpenMPI (RPM's) are not relocatable. That's >>>> uncomfortably. (Actually, this mail is initiated by the fact that Sun >>>> ClusterTools RPM's are not relocatable) >>>> >>>> >>>> So, does this behavior have an deeper sence I cannot recognise, or >>>> maybe the configuring of global paths is not needed? >>>> >>>> What I mean, is that the paths for the configuration files, which >>>> opal_wrapper need, may be setted locally like ../share/openmpi/*** >>>> without affectiong the integrity of OpenMPI. Maybe there were were >>>> more places where the usage of local paths may be needed to allowe >>>> movable (relocable) OpenMPI. >>>> >>>> What do you mean about? >>>> >>>> Best regards >>>> Paul Kapinos >>>> >>>> >>>> >>>> <kapinos.vcf>_______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: verwurschel_pfade_openmpi.sh >> Type: application/x-sh >> Size: 369 bytes >> Desc: not available >> URL: >> <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.sh> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: kapinos.vcf >> Type: text/x-vcard >> Size: 330 bytes >> Desc: not available >> URL: >> <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.vcf> >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: smime.p7s >> Type: application/x-pkcs7-signature >> Size: 4230 bytes >> Desc: S/MIME Cryptographic Signature >> URL: >> <http://www.open-mpi.org/MailArchives/users/attachments/20080915/434c3679/attachment.bin> >> >> ------------------------------ >> >> Message: 3 >> Date: Mon, 15 Sep 2008 08:46:11 -0700 >> From: Eugene Loh <eugene....@sun.com> >> Subject: Re: [OMPI users] MPI_sendrecv = MPI_Send+ MPI_RECV ? >> To: Open MPI Users <us...@open-mpi.org> >> Message-ID: <48ce8343.7060...@sun.com> >> Content-Type: text/plain; format=flowed; charset=ISO-8859-1 >> >> Aur?lien Bouteiller wrote: >> >>> You can't assume that MPI_Send does buffering. >> >> Yes, but I think this is what Eric meant by misinterpreting Enrico's >> problem. The communication pattern is to send a message, which is >> received remotely. There is remote computation, and then data is sent >> back. No buffering is needed for such a pattern. The code is >> "apparently" legal. There is apparently something else going on in the >> "real" code that is not captured in the example Enrico sent. >> >> Further, if I understand correctly, the remote process actually receives >> the data! If this is true, the example is as simple as: >> >> process 1: >> MPI_Send() // this call blocks >> >> process 0: >> MPI_Recv() // this call actually receives the data sent by >> MPI_Send!!! >> >> Enrico originally explained that process 0 actually receives the data. >> So, MPI's internal buffering is presumably not a problem at all! An >> MPI_Send effectively sends data to a remote process, but simply never >> returns control to the user program. >> >>> Without buffering, you are in a possible deadlock situation. This >>> pathological case is the exact motivation for the existence of >>> MPI_Sendrecv. You can also consider Isend Recv Wait, then the Send >>> will never block, even if the destination is not ready to receive, or >>> MPI_Bsend that will add explicit buffering and therefore return >>> control to you before the message transmission actually begun. >>> >>> Aurelien >>> >>> >>> Le 15 sept. 08 ? 01:08, Eric Thibodeau a ?crit : >>> >>>> Sorry about that, I had misinterpreted your original post as being >>>> the pair of send-receive. The example you give below does seem >>>> correct indeed, which means you might have to show us the code that >>>> doesn't work. Note that I am in no way a Fortran expert, I'm more >>>> versed in C. The only hint I'd give a C programmer in this case is >>>> "make sure your receiving structures are indeed large enough (ie: >>>> you send 3d but eventually receive 4d...did you allocate for 3d or >>>> 4d for receiving the converted array...). >>>> >>>> Eric >>>> >>>> Enrico Barausse wrote: >>>> >>>>> sorry, I hadn't changed the subject. I'm reposting: >>>>> >>>>> Hi >>>>> >>>>> I think it's correct. what I want to to is to send a 3d array from the >>>>> process 1 to process 0 =root): >>>>> call MPI_Send(toroot,3,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD >>>>> >>>>> in some other part of the code process 0 acts on the 3d array and >>>>> turns it into a 4d one and sends it back to process 1, which receives >>>>> it with >>>>> >>>>> call MPI_RECV(tonode, >>>>> 4,MPI_DOUBLE_PRECISION,root,n,MPI_COMM_WORLD,status,ierr) >>>>> >>>>> in practice, what I do i basically give by this simple code (which >>>>> doesn't give the segmentation fault unfortunately): >>>>> >>>>> >>>>> >>>>> a=(/1,2,3,4,5/) >>>>> >>>>> call MPI_INIT(ierr) >>>>> call MPI_COMM_RANK(MPI_COMM_WORLD, id, ierr) >>>>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr) >>>>> >>>>> if(numprocs/=2) stop >>>>> >>>>> if(id==0) then >>>>> do k=1,5 >>>>> a=a+1 >>>>> call MPI_SEND(a,5,MPI_INTEGER, >>>>> 1,k,MPI_COMM_WORLD,ierr) >>>>> call >>>>> MPI_RECV(b,4,MPI_INTEGER,1,k,MPI_COMM_WORLD,status,ierr) >>>>> end do >>>>> else >>>>> do k=1,5 >>>>> call >>>>> MPI_RECV(a,5,MPI_INTEGER,0,k,MPI_COMM_WORLD,status,ierr) >>>>> b=a(1:4) >>>>> call MPI_SEND(b,4,MPI_INTEGER, >>>>> 0,k,MPI_COMM_WORLD,ierr) >>>>> end do >>>>> end if >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> -- >>> * Dr. Aur?lien Bouteiller >>> * Sr. Research Associate at Innovative Computing Laboratory >>> * University of Tennessee >>> * 1122 Volunteer Boulevard, suite 350 >>> * Knoxville, TN 37996 >>> * 865 974 6321 >>> >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/user >>> s >> >> >> >> >> >> >> ------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> End of users Digest, Vol 1006, Issue 2 >> ************************************** >> >