Ok, all, where to begin... Perhaps I should start with the most pressing issue for me. I need 64-bit indexing
@Martin, you indicated that even if I get this up and running, the MPI library still uses signed 32-bit ints to count (your term), or index (my term) the recvbuffer lengths. More concretely, in a call to MPI_Allgatherv( buffer, count, MPI_Integer, recvbuf, recv-count, displ, MPI_integer, MPI_COMM_WORLD, status, mpierr): count, recvcounts, and displs must be 32-bit integers, not 64-bit. Actually, all I need is displs to hold 64-bit values... If this is true, then compiling OpenMPI this way is not a solution. I'll have to restructure my code to collect 31-bit chunks... Not that it matters, but I'm not using DIRAC, but a custom code to compute circuit analyses. @Jeff, Interesting, your runtime behavior has a different error than mine. You have problems with the passed variable tempInt, which would make sense for the reasons you gave. However, my problem involves the fact that the local variable "rank" gets overwritten by a memory corruption after MPI_RECV is called. Re: config.log. I will try to have the admin guy recompile tomorrow and see if I can get the log for you. BTW, I'm using the gcc 4.7.2 compiler suite on a Rocks 5.4 HPC cluster. I use the options -m64 and -fdefault-integer-8 Cheers, --Jim On Wed, Oct 30, 2013 at 7:36 PM, Martin Siegert <sieg...@sfu.ca> wrote: > Hi Jim, > > I have quite a bit experience with compiling openmpi for dirac. > Here is what I use to configure openmpi: > > ./configure --prefix=$instdir \ > --disable-silent-rules \ > --enable-mpirun-prefix-by-default \ > --with-threads=posix \ > --enable-cxx-exceptions \ > --with-tm=$torquedir \ > --with-wrapper-ldflags="-Wl,-rpath,${instdir}/lib" \ > --with-openib \ > --with-hwloc=$hwlocdir \ > CC=gcc \ > CXX=g++ \ > FC="$FC" \ > F77="$FC" \ > CFLAGS="-O3" \ > CXXFLAGS="-O3" \ > FFLAGS="-O3 $I8FLAG" \ > FCFLAGS="-O3 $I8FLAG" > > You need to set FC to either ifort or gfortran (those are the two compilers > that I have used) and set I8FLAG to -fdefault-integer-8 for gfortran or > -i8 for ifort. > Set torquedir to the directory where torque is installed ($torquedir/lib > must contain libtorque.so), if you are running jobs under torque; otherwise > remove the --with-tm=... line. > Set hwlocdir to the directory where you have hwloc installed. You many not > need the -with-hwloc=... option because openmpi comes with a hwloc version > (I don't have experience with that because we install hwloc independently). > Set instdir to the directory where you what to install openmpi. > You may or may not need the --with-openib option depending on whether > you have an Infiniband interconnect. > > After configure/make/make install this so compiled version can be used > with dirac without changing the dirac source code. > (there is one caveat: you should make sure that all "count" variables > in MPI calls in dirac are smaller than 2^31-1. I have run into a few cases > when that is not the case; this problem can be overcome by replacing > MPI_Allreduce calls in dirac with a wrapper that calls MPI_Allreduce > repeatedly). This is what I use to setup dirac: > > export PATH=$instdir/bin > ./setup --prefix=$diracinstdir \ > --fc=mpif90 \ > --cc=mpicc \ > --int64 \ > --explicit-libs="-lmkl_intel_ilp64 -lmkl_sequential -lmkl_core" > > where $instdir is the directory where you installed openmpi from above. > > I would never use the so-compiled openmpi version for anything other > than dirac though. I am not saying that it cannot work (at a minimum > you need to compile Fortran programs with the appropriate I8FLAG), > but it is an unnecessary complication: I have not encountered a piece > of software other than dirac that requires this. > > Cheers, > Martin > > -- > Martin Siegert > Head, Research Computing > WestGrid/ComputeCanada Site Lead > Simon Fraser University > Burnaby, British Columbia > Canada > > On Wed, Oct 30, 2013 at 06:00:56PM -0500, Jim Parker wrote: > > > > Jeff, > > Here's what I know: > > 1. Checked FAQs. Done > > 2. Version 1.6.5 > > 3. config.log file has been removed by the sysadmin... > > 4. ompi_info -a from head node is in attached as headnode.out > > 5. N/A > > 6. compute node info in attached as compute-x-yy.out > > 7. As discussed, local variables are being overwritten after calls to > > MPI_RECV from Fortran code > > 8. ifconfig output from head node and computes listed as > *-ifconfig.out > > Cheers, > > --Jim > > > > On Wed, Oct 30, 2013 at 5:29 PM, Jeff Squyres (jsquyres) > > <[1]jsquy...@cisco.com> wrote: > > > > Can you send the information listed here: > > [2]http://www.open-mpi.org/community/help/ > > > > On Oct 30, 2013, at 6:22 PM, Jim Parker <[3]jimparker96...@gmail.com> > > wrote: > > > Jeff and Ralph, > > > Ok, I downshifted to a helloWorld example (attached), bottom line > > after I hit the MPI_Recv call, my local variable (rank) gets borked. > > > > > > I have compiled with -m64 -fdefault-integer-8 and even have assigned > > kind=8 to the integers (which would be the preferred method in my > case) > > > > > > Your help is appreciated. > > > > > > Cheers, > > > --Jim > > > > > > > > > > > > On Wed, Oct 30, 2013 at 4:49 PM, Jeff Squyres (jsquyres) > > <[4]jsquy...@cisco.com> wrote: > > > On Oct 30, 2013, at 4:35 PM, Jim Parker <[5] > jimparker96...@gmail.com> > > wrote: > > > > > > > I have recently built a cluster that uses the 64-bit indexing > > feature of OpenMPI following the directions at > > > > > > [6] > http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_fo > > r_64-bit_integers > > > > > > That should be correct (i.e., passing -i8 in FFLAGS and FCFLAGS for > > OMPI 1.6.x). > > > > > > > My question is what are the new prototypes for the MPI calls ? > > > > specifically > > > > MPI_RECV > > > > MPI_Allgathterv > > > > > > They're the same as they've always been. > > > > > > The magic is that the -i8 flag tells the compiler "make all Fortran > > INTEGERs be 8 bytes, not (the default) 4." So Ralph's answer was > > correct in that all the MPI parameters are INTEGERs -- but you can > tell > > the compiler that all INTEGERs are 8 bytes, not 4, and therefore get > > "large" integers. > > > > > > Note that this means that you need to compile your application with > > -i8, too. That will make *your* INTEGERs also be 8 bytes, and then > > you'll match what Open MPI is doing. > > > > > > > I'm curious because some off my local variables get killed (set to > > null) upon my first call to MPI_RECV. Typically, this is due (in > > Fortran) to someone not setting the 'status' variable to an > appropriate > > array size. > > > > > > If you didn't compile your application with -i8, this could well be > > because your application is treating INTEGERs as 4 bytes, but OMPI is > > treating INTEGERs as 8 bytes. Nothing good can come from that. > > > > > > If you *did* compile your application with -i8 and you're seeing > this > > kind of wonkyness, we should dig deeper and see what's going on. > > > > > > > My review of mpif.h and mpi.h seem to indicate that the functions > > are defined as C int types and therefore , I assume, the coercion > > during the compile makes the library support 64-bit indexing. ie. int > > -> long int > > > > > > FWIW: We actually define a type MPI_Fint; its actual type is > > determined by configure (int or long int, IIRC). When your Fortran > > code calls C, we use the MPI_Fint type for parameters, and so it will > > be either a 4 or 8 byte integer type. > > > > > > -- > > > Jeff Squyres > > > [7]jsquy...@cisco.com > > > For corporate legal information go to: > > [8]http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > _______________________________________________ > > > users mailing list > > > [9]us...@open-mpi.org > > > [10]http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > > <mpi-test-64bit.tar.bz2>____________________________________________ > > ___ > > > > > users mailing list > > > [11]us...@open-mpi.org > > > [12]http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- > > Jeff Squyres > > [13]jsquy...@cisco.com > > For corporate legal information go to: > > [14]http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > > users mailing list > > [15]us...@open-mpi.org > > [16]http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > References > > > > 1. mailto:jsquy...@cisco.com > > 2. http://www.open-mpi.org/community/help/ > > 3. mailto:jimparker96...@gmail.com > > 4. mailto:jsquy...@cisco.com > > 5. mailto:jimparker96...@gmail.com > > 6. > http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_for_64-bit_integers > > 7. mailto:jsquy...@cisco.com > > 8. http://www.cisco.com/web/about/doing_business/legal/cri/ > > 9. mailto:us...@open-mpi.org > > 10. http://www.open-mpi.org/mailman/listinfo.cgi/users > > 11. mailto:us...@open-mpi.org > > 12. http://www.open-mpi.org/mailman/listinfo.cgi/users > > 13. mailto:jsquy...@cisco.com > > 14. http://www.cisco.com/web/about/doing_business/legal/cri/ > > 15. mailto:us...@open-mpi.org > > 16. http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >