Some additional info that may jog some solutions. Calls to MPI_SEND do not cause memory corruption. Only calls to MPI_RECV. Since the main difference is the fact that MPI_RECV needs a "status" array and SEND does not, seems to indicate to me that something is wrong with status.
Also, I can run a C version of the helloWorld program with no errors. However, int types are only 4-byte. To send 8byte integers, I define tempInt as long int and pass MPI_LONG as a type. @Jeff, I got a copy of the openmpi conf.log. See attached. Cheers, --Jim On Wed, Oct 30, 2013 at 10:55 PM, Jim Parker <jimparker96...@gmail.com>wrote: > Ok, all, where to begin... > > Perhaps I should start with the most pressing issue for me. I need 64-bit > indexing > > @Martin, > you indicated that even if I get this up and running, the MPI library > still uses signed 32-bit ints to count (your term), or index (my term) the > recvbuffer lengths. More concretely, > in a call to MPI_Allgatherv( buffer, count, MPI_Integer, recvbuf, > recv-count, displ, MPI_integer, MPI_COMM_WORLD, status, mpierr): count, > recvcounts, and displs must be 32-bit integers, not 64-bit. Actually, all > I need is displs to hold 64-bit values... > If this is true, then compiling OpenMPI this way is not a solution. I'll > have to restructure my code to collect 31-bit chunks... > Not that it matters, but I'm not using DIRAC, but a custom code to compute > circuit analyses. > > @Jeff, > Interesting, your runtime behavior has a different error than mine. You > have problems with the passed variable tempInt, which would make sense for > the reasons you gave. However, my problem involves the fact that the local > variable "rank" gets overwritten by a memory corruption after MPI_RECV is > called. > > Re: config.log. I will try to have the admin guy recompile tomorrow and > see if I can get the log for you. > > BTW, I'm using the gcc 4.7.2 compiler suite on a Rocks 5.4 HPC cluster. I > use the options -m64 and -fdefault-integer-8 > > Cheers, > --Jim > > > > On Wed, Oct 30, 2013 at 7:36 PM, Martin Siegert <sieg...@sfu.ca> wrote: > >> Hi Jim, >> >> I have quite a bit experience with compiling openmpi for dirac. >> Here is what I use to configure openmpi: >> >> ./configure --prefix=$instdir \ >> --disable-silent-rules \ >> --enable-mpirun-prefix-by-default \ >> --with-threads=posix \ >> --enable-cxx-exceptions \ >> --with-tm=$torquedir \ >> --with-wrapper-ldflags="-Wl,-rpath,${instdir}/lib" \ >> --with-openib \ >> --with-hwloc=$hwlocdir \ >> CC=gcc \ >> CXX=g++ \ >> FC="$FC" \ >> F77="$FC" \ >> CFLAGS="-O3" \ >> CXXFLAGS="-O3" \ >> FFLAGS="-O3 $I8FLAG" \ >> FCFLAGS="-O3 $I8FLAG" >> >> You need to set FC to either ifort or gfortran (those are the two >> compilers >> that I have used) and set I8FLAG to -fdefault-integer-8 for gfortran or >> -i8 for ifort. >> Set torquedir to the directory where torque is installed ($torquedir/lib >> must contain libtorque.so), if you are running jobs under torque; >> otherwise >> remove the --with-tm=... line. >> Set hwlocdir to the directory where you have hwloc installed. You many not >> need the -with-hwloc=... option because openmpi comes with a hwloc version >> (I don't have experience with that because we install hwloc >> independently). >> Set instdir to the directory where you what to install openmpi. >> You may or may not need the --with-openib option depending on whether >> you have an Infiniband interconnect. >> >> After configure/make/make install this so compiled version can be used >> with dirac without changing the dirac source code. >> (there is one caveat: you should make sure that all "count" variables >> in MPI calls in dirac are smaller than 2^31-1. I have run into a few cases >> when that is not the case; this problem can be overcome by replacing >> MPI_Allreduce calls in dirac with a wrapper that calls MPI_Allreduce >> repeatedly). This is what I use to setup dirac: >> >> export PATH=$instdir/bin >> ./setup --prefix=$diracinstdir \ >> --fc=mpif90 \ >> --cc=mpicc \ >> --int64 \ >> --explicit-libs="-lmkl_intel_ilp64 -lmkl_sequential -lmkl_core" >> >> where $instdir is the directory where you installed openmpi from above. >> >> I would never use the so-compiled openmpi version for anything other >> than dirac though. I am not saying that it cannot work (at a minimum >> you need to compile Fortran programs with the appropriate I8FLAG), >> but it is an unnecessary complication: I have not encountered a piece >> of software other than dirac that requires this. >> >> Cheers, >> Martin >> >> -- >> Martin Siegert >> Head, Research Computing >> WestGrid/ComputeCanada Site Lead >> Simon Fraser University >> Burnaby, British Columbia >> Canada >> >> On Wed, Oct 30, 2013 at 06:00:56PM -0500, Jim Parker wrote: >> > >> > Jeff, >> > Here's what I know: >> > 1. Checked FAQs. Done >> > 2. Version 1.6.5 >> > 3. config.log file has been removed by the sysadmin... >> > 4. ompi_info -a from head node is in attached as headnode.out >> > 5. N/A >> > 6. compute node info in attached as compute-x-yy.out >> > 7. As discussed, local variables are being overwritten after calls to >> > MPI_RECV from Fortran code >> > 8. ifconfig output from head node and computes listed as >> *-ifconfig.out >> > Cheers, >> > --Jim >> > >> > On Wed, Oct 30, 2013 at 5:29 PM, Jeff Squyres (jsquyres) >> > <[1]jsquy...@cisco.com> wrote: >> > >> > Can you send the information listed here: >> > [2]http://www.open-mpi.org/community/help/ >> > >> > On Oct 30, 2013, at 6:22 PM, Jim Parker <[3]jimparker96...@gmail.com >> > >> > wrote: >> > > Jeff and Ralph, >> > > Ok, I downshifted to a helloWorld example (attached), bottom line >> > after I hit the MPI_Recv call, my local variable (rank) gets borked. >> > > >> > > I have compiled with -m64 -fdefault-integer-8 and even have >> assigned >> > kind=8 to the integers (which would be the preferred method in my >> case) >> > > >> > > Your help is appreciated. >> > > >> > > Cheers, >> > > --Jim >> > > >> > > >> > > >> > > On Wed, Oct 30, 2013 at 4:49 PM, Jeff Squyres (jsquyres) >> > <[4]jsquy...@cisco.com> wrote: >> > > On Oct 30, 2013, at 4:35 PM, Jim Parker <[5] >> jimparker96...@gmail.com> >> > wrote: >> > > >> > > > I have recently built a cluster that uses the 64-bit indexing >> > feature of OpenMPI following the directions at >> > > > >> > [6] >> http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_fo >> > r_64-bit_integers >> > > >> > > That should be correct (i.e., passing -i8 in FFLAGS and FCFLAGS for >> > OMPI 1.6.x). >> > > >> > > > My question is what are the new prototypes for the MPI calls ? >> > > > specifically >> > > > MPI_RECV >> > > > MPI_Allgathterv >> > > >> > > They're the same as they've always been. >> > > >> > > The magic is that the -i8 flag tells the compiler "make all Fortran >> > INTEGERs be 8 bytes, not (the default) 4." So Ralph's answer was >> > correct in that all the MPI parameters are INTEGERs -- but you can >> tell >> > the compiler that all INTEGERs are 8 bytes, not 4, and therefore get >> > "large" integers. >> > > >> > > Note that this means that you need to compile your application with >> > -i8, too. That will make *your* INTEGERs also be 8 bytes, and then >> > you'll match what Open MPI is doing. >> > > >> > > > I'm curious because some off my local variables get killed (set >> to >> > null) upon my first call to MPI_RECV. Typically, this is due (in >> > Fortran) to someone not setting the 'status' variable to an >> appropriate >> > array size. >> > > >> > > If you didn't compile your application with -i8, this could well be >> > because your application is treating INTEGERs as 4 bytes, but OMPI is >> > treating INTEGERs as 8 bytes. Nothing good can come from that. >> > > >> > > If you *did* compile your application with -i8 and you're seeing >> this >> > kind of wonkyness, we should dig deeper and see what's going on. >> > > >> > > > My review of mpif.h and mpi.h seem to indicate that the functions >> > are defined as C int types and therefore , I assume, the coercion >> > during the compile makes the library support 64-bit indexing. ie. >> int >> > -> long int >> > > >> > > FWIW: We actually define a type MPI_Fint; its actual type is >> > determined by configure (int or long int, IIRC). When your Fortran >> > code calls C, we use the MPI_Fint type for parameters, and so it will >> > be either a 4 or 8 byte integer type. >> > > >> > > -- >> > > Jeff Squyres >> > > [7]jsquy...@cisco.com >> > > For corporate legal information go to: >> > [8]http://www.cisco.com/web/about/doing_business/legal/cri/ >> > > >> > > _______________________________________________ >> > > users mailing list >> > > [9]us...@open-mpi.org >> > > [10]http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > >> > >> > > >> > >> <mpi-test-64bit.tar.bz2>____________________________________________ >> > ___ >> > >> > > users mailing list >> > > [11]us...@open-mpi.org >> > > [12]http://www.open-mpi.org/mailman/listinfo.cgi/users >> > -- >> > Jeff Squyres >> > [13]jsquy...@cisco.com >> > For corporate legal information go to: >> > [14]http://www.cisco.com/web/about/doing_business/legal/cri/ >> > _______________________________________________ >> > users mailing list >> > [15]us...@open-mpi.org >> > [16]http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >> > References >> > >> > 1. mailto:jsquy...@cisco.com >> > 2. http://www.open-mpi.org/community/help/ >> > 3. mailto:jimparker96...@gmail.com >> > 4. mailto:jsquy...@cisco.com >> > 5. mailto:jimparker96...@gmail.com >> > 6. >> http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_for_64-bit_integers >> > 7. mailto:jsquy...@cisco.com >> > 8. http://www.cisco.com/web/about/doing_business/legal/cri/ >> > 9. mailto:us...@open-mpi.org >> > 10. http://www.open-mpi.org/mailman/listinfo.cgi/users >> > 11. mailto:us...@open-mpi.org >> > 12. http://www.open-mpi.org/mailman/listinfo.cgi/users >> > 13. mailto:jsquy...@cisco.com >> > 14. http://www.cisco.com/web/about/doing_business/legal/cri/ >> > 15. mailto:us...@open-mpi.org >> > 16. http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > >
openmpi-1.6.5.config.tar.gz
Description: GNU Zip compressed data