Jim -- This has been bugging me for a while. I finally got to check today: it looks like we compute MPI_STATUS_SIZE correctly in both the 32 and 64 bit cases.
That is, MPI_STATUS_SIZE exactly reflects the size of the C MPI_Status (4 int's and a size_t, or 4 * 4 + 8 = 24 bytes), regardless of whether Fortran is 32 or 64 bits. There are two implications here: 1. OMPI should not be overwriting your status array if you're using MPI_STATUS_SIZE as its length. Meaning: in the 32 bit case, MPI_STATUS_SIZE=6, and in the 64 bit case, MPI_STATUS_SIZE=3. 2. In the 64 bit case, you'll have a difficult time extracting the MPI status values from the 8-byte INTEGERs in the status array in Fortran (because the first 2 of 3 each really be 2 4-byte integers). So while #2 is a little weird (and probably should be fixed), a properly-sized MPI_STATUS_SIZE array shouldn't be causing any problems. So I'm still a little befuddled as to why you're seeing an error. :-\ On Nov 1, 2013, at 4:51 PM, Jim Parker <jimparker96...@gmail.com> wrote: > @Jeff, > > Well, it may have been "just for giggles", but it Worked!! My helloWorld > program ran as expected. My original code ran through the initialization > parts without overwriting data. It will take a few days to finish > computation and analysis to ensure it ran as expected. I'll report back when > I get done. > > It looks like a good Friday! > > Cheers, > --Jim > > On Thu, Oct 31, 2013 at 4:06 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > For giggles, try using MPI_STATUS_IGNORE (assuming you don't need to look at > the status at all). See if that works for you. > > Meaning: I wonder if we're computing the status size for Fortran incorrectly > in the -i8 case... > > > On Oct 31, 2013, at 1:58 PM, Jim Parker <jimparker96...@gmail.com> wrote: > > > Some additional info that may jog some solutions. Calls to MPI_SEND do not > > cause memory corruption. Only calls to MPI_RECV. Since the main > > difference is the fact that MPI_RECV needs a "status" array and SEND does > > not, seems to indicate to me that something is wrong with status. > > > > Also, I can run a C version of the helloWorld program with no errors. > > However, int types are only 4-byte. To send 8byte integers, I define > > tempInt as long int and pass MPI_LONG as a type. > > > > @Jeff, > > I got a copy of the openmpi conf.log. See attached. > > > > Cheers, > > --Jim > > > > On Wed, Oct 30, 2013 at 10:55 PM, Jim Parker <jimparker96...@gmail.com> > > wrote: > > Ok, all, where to begin... > > > > Perhaps I should start with the most pressing issue for me. I need 64-bit > > indexing > > > > @Martin, > > you indicated that even if I get this up and running, the MPI library > > still uses signed 32-bit ints to count (your term), or index (my term) the > > recvbuffer lengths. More concretely, > > in a call to MPI_Allgatherv( buffer, count, MPI_Integer, recvbuf, > > recv-count, displ, MPI_integer, MPI_COMM_WORLD, status, mpierr): count, > > recvcounts, and displs must be 32-bit integers, not 64-bit. Actually, all > > I need is displs to hold 64-bit values... > > If this is true, then compiling OpenMPI this way is not a solution. I'll > > have to restructure my code to collect 31-bit chunks... > > Not that it matters, but I'm not using DIRAC, but a custom code to compute > > circuit analyses. > > > > @Jeff, > > Interesting, your runtime behavior has a different error than mine. You > > have problems with the passed variable tempInt, which would make sense for > > the reasons you gave. However, my problem involves the fact that the local > > variable "rank" gets overwritten by a memory corruption after MPI_RECV is > > called. > > > > Re: config.log. I will try to have the admin guy recompile tomorrow and see > > if I can get the log for you. > > > > BTW, I'm using the gcc 4.7.2 compiler suite on a Rocks 5.4 HPC cluster. I > > use the options -m64 and -fdefault-integer-8 > > > > Cheers, > > --Jim > > > > > > > > On Wed, Oct 30, 2013 at 7:36 PM, Martin Siegert <sieg...@sfu.ca> wrote: > > Hi Jim, > > > > I have quite a bit experience with compiling openmpi for dirac. > > Here is what I use to configure openmpi: > > > > ./configure --prefix=$instdir \ > > --disable-silent-rules \ > > --enable-mpirun-prefix-by-default \ > > --with-threads=posix \ > > --enable-cxx-exceptions \ > > --with-tm=$torquedir \ > > --with-wrapper-ldflags="-Wl,-rpath,${instdir}/lib" \ > > --with-openib \ > > --with-hwloc=$hwlocdir \ > > CC=gcc \ > > CXX=g++ \ > > FC="$FC" \ > > F77="$FC" \ > > CFLAGS="-O3" \ > > CXXFLAGS="-O3" \ > > FFLAGS="-O3 $I8FLAG" \ > > FCFLAGS="-O3 $I8FLAG" > > > > You need to set FC to either ifort or gfortran (those are the two compilers > > that I have used) and set I8FLAG to -fdefault-integer-8 for gfortran or > > -i8 for ifort. > > Set torquedir to the directory where torque is installed ($torquedir/lib > > must contain libtorque.so), if you are running jobs under torque; otherwise > > remove the --with-tm=... line. > > Set hwlocdir to the directory where you have hwloc installed. You many not > > need the -with-hwloc=... option because openmpi comes with a hwloc version > > (I don't have experience with that because we install hwloc independently). > > Set instdir to the directory where you what to install openmpi. > > You may or may not need the --with-openib option depending on whether > > you have an Infiniband interconnect. > > > > After configure/make/make install this so compiled version can be used > > with dirac without changing the dirac source code. > > (there is one caveat: you should make sure that all "count" variables > > in MPI calls in dirac are smaller than 2^31-1. I have run into a few cases > > when that is not the case; this problem can be overcome by replacing > > MPI_Allreduce calls in dirac with a wrapper that calls MPI_Allreduce > > repeatedly). This is what I use to setup dirac: > > > > export PATH=$instdir/bin > > ./setup --prefix=$diracinstdir \ > > --fc=mpif90 \ > > --cc=mpicc \ > > --int64 \ > > --explicit-libs="-lmkl_intel_ilp64 -lmkl_sequential -lmkl_core" > > > > where $instdir is the directory where you installed openmpi from above. > > > > I would never use the so-compiled openmpi version for anything other > > than dirac though. I am not saying that it cannot work (at a minimum > > you need to compile Fortran programs with the appropriate I8FLAG), > > but it is an unnecessary complication: I have not encountered a piece > > of software other than dirac that requires this. > > > > Cheers, > > Martin > > > > -- > > Martin Siegert > > Head, Research Computing > > WestGrid/ComputeCanada Site Lead > > Simon Fraser University > > Burnaby, British Columbia > > Canada > > > > On Wed, Oct 30, 2013 at 06:00:56PM -0500, Jim Parker wrote: > > > > > > Jeff, > > > Here's what I know: > > > 1. Checked FAQs. Done > > > 2. Version 1.6.5 > > > 3. config.log file has been removed by the sysadmin... > > > 4. ompi_info -a from head node is in attached as headnode.out > > > 5. N/A > > > 6. compute node info in attached as compute-x-yy.out > > > 7. As discussed, local variables are being overwritten after calls to > > > MPI_RECV from Fortran code > > > 8. ifconfig output from head node and computes listed as *-ifconfig.out > > > Cheers, > > > --Jim > > > > > > On Wed, Oct 30, 2013 at 5:29 PM, Jeff Squyres (jsquyres) > > > <[1]jsquy...@cisco.com> wrote: > > > > > > Can you send the information listed here: > > > [2]http://www.open-mpi.org/community/help/ > > > > > > On Oct 30, 2013, at 6:22 PM, Jim Parker <[3]jimparker96...@gmail.com> > > > wrote: > > > > Jeff and Ralph, > > > > Ok, I downshifted to a helloWorld example (attached), bottom line > > > after I hit the MPI_Recv call, my local variable (rank) gets borked. > > > > > > > > I have compiled with -m64 -fdefault-integer-8 and even have assigned > > > kind=8 to the integers (which would be the preferred method in my case) > > > > > > > > Your help is appreciated. > > > > > > > > Cheers, > > > > --Jim > > > > > > > > > > > > > > > > On Wed, Oct 30, 2013 at 4:49 PM, Jeff Squyres (jsquyres) > > > <[4]jsquy...@cisco.com> wrote: > > > > On Oct 30, 2013, at 4:35 PM, Jim Parker <[5]jimparker96...@gmail.com> > > > wrote: > > > > > > > > > I have recently built a cluster that uses the 64-bit indexing > > > feature of OpenMPI following the directions at > > > > > > > > [6]http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_fo > > > r_64-bit_integers > > > > > > > > That should be correct (i.e., passing -i8 in FFLAGS and FCFLAGS for > > > OMPI 1.6.x). > > > > > > > > > My question is what are the new prototypes for the MPI calls ? > > > > > specifically > > > > > MPI_RECV > > > > > MPI_Allgathterv > > > > > > > > They're the same as they've always been. > > > > > > > > The magic is that the -i8 flag tells the compiler "make all Fortran > > > INTEGERs be 8 bytes, not (the default) 4." So Ralph's answer was > > > correct in that all the MPI parameters are INTEGERs -- but you can tell > > > the compiler that all INTEGERs are 8 bytes, not 4, and therefore get > > > "large" integers. > > > > > > > > Note that this means that you need to compile your application with > > > -i8, too. That will make *your* INTEGERs also be 8 bytes, and then > > > you'll match what Open MPI is doing. > > > > > > > > > I'm curious because some off my local variables get killed (set to > > > null) upon my first call to MPI_RECV. Typically, this is due (in > > > Fortran) to someone not setting the 'status' variable to an appropriate > > > array size. > > > > > > > > If you didn't compile your application with -i8, this could well be > > > because your application is treating INTEGERs as 4 bytes, but OMPI is > > > treating INTEGERs as 8 bytes. Nothing good can come from that. > > > > > > > > If you *did* compile your application with -i8 and you're seeing this > > > kind of wonkyness, we should dig deeper and see what's going on. > > > > > > > > > My review of mpif.h and mpi.h seem to indicate that the functions > > > are defined as C int types and therefore , I assume, the coercion > > > during the compile makes the library support 64-bit indexing. ie. int > > > -> long int > > > > > > > > FWIW: We actually define a type MPI_Fint; its actual type is > > > determined by configure (int or long int, IIRC). When your Fortran > > > code calls C, we use the MPI_Fint type for parameters, and so it will > > > be either a 4 or 8 byte integer type. > > > > > > > > -- > > > > Jeff Squyres > > > > [7]jsquy...@cisco.com > > > > For corporate legal information go to: > > > [8]http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > > > _______________________________________________ > > > > users mailing list > > > > [9]us...@open-mpi.org > > > > [10]http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > > > > > > <mpi-test-64bit.tar.bz2>____________________________________________ > > > ___ > > > > > > > users mailing list > > > > [11]us...@open-mpi.org > > > > [12]http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > > > Jeff Squyres > > > [13]jsquy...@cisco.com > > > For corporate legal information go to: > > > [14]http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > > > users mailing list > > > [15]us...@open-mpi.org > > > [16]http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > References > > > > > > 1. mailto:jsquy...@cisco.com > > > 2. http://www.open-mpi.org/community/help/ > > > 3. mailto:jimparker96...@gmail.com > > > 4. mailto:jsquy...@cisco.com > > > 5. mailto:jimparker96...@gmail.com > > > 6. > > > http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_for_64-bit_integers > > > 7. mailto:jsquy...@cisco.com > > > 8. http://www.cisco.com/web/about/doing_business/legal/cri/ > > > 9. mailto:us...@open-mpi.org > > > 10. http://www.open-mpi.org/mailman/listinfo.cgi/users > > > 11. mailto:us...@open-mpi.org > > > 12. http://www.open-mpi.org/mailman/listinfo.cgi/users > > > 13. mailto:jsquy...@cisco.com > > > 14. http://www.cisco.com/web/about/doing_business/legal/cri/ > > > 15. mailto:us...@open-mpi.org > > > 16. http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > <openmpi-1.6.5.config.tar.gz>_______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/