@Jeff, Well, it may have been "just for giggles", but it Worked!! My helloWorld program ran as expected. My original code ran through the initialization parts without overwriting data. It will take a few days to finish computation and analysis to ensure it ran as expected. I'll report back when I get done.
It looks like a good Friday! Cheers, --Jim On Thu, Oct 31, 2013 at 4:06 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > For giggles, try using MPI_STATUS_IGNORE (assuming you don't need to look > at the status at all). See if that works for you. > > Meaning: I wonder if we're computing the status size for Fortran > incorrectly in the -i8 case... > > > On Oct 31, 2013, at 1:58 PM, Jim Parker <jimparker96...@gmail.com> wrote: > > > Some additional info that may jog some solutions. Calls to MPI_SEND do > not cause memory corruption. Only calls to MPI_RECV. Since the main > difference is the fact that MPI_RECV needs a "status" array and SEND does > not, seems to indicate to me that something is wrong with status. > > > > Also, I can run a C version of the helloWorld program with no errors. > However, int types are only 4-byte. To send 8byte integers, I define > tempInt as long int and pass MPI_LONG as a type. > > > > @Jeff, > > I got a copy of the openmpi conf.log. See attached. > > > > Cheers, > > --Jim > > > > On Wed, Oct 30, 2013 at 10:55 PM, Jim Parker <jimparker96...@gmail.com> > wrote: > > Ok, all, where to begin... > > > > Perhaps I should start with the most pressing issue for me. I need > 64-bit indexing > > > > @Martin, > > you indicated that even if I get this up and running, the MPI library > still uses signed 32-bit ints to count (your term), or index (my term) the > recvbuffer lengths. More concretely, > > in a call to MPI_Allgatherv( buffer, count, MPI_Integer, recvbuf, > recv-count, displ, MPI_integer, MPI_COMM_WORLD, status, mpierr): count, > recvcounts, and displs must be 32-bit integers, not 64-bit. Actually, all > I need is displs to hold 64-bit values... > > If this is true, then compiling OpenMPI this way is not a solution. > I'll have to restructure my code to collect 31-bit chunks... > > Not that it matters, but I'm not using DIRAC, but a custom code to > compute circuit analyses. > > > > @Jeff, > > Interesting, your runtime behavior has a different error than mine. > You have problems with the passed variable tempInt, which would make sense > for the reasons you gave. However, my problem involves the fact that the > local variable "rank" gets overwritten by a memory corruption after > MPI_RECV is called. > > > > Re: config.log. I will try to have the admin guy recompile tomorrow and > see if I can get the log for you. > > > > BTW, I'm using the gcc 4.7.2 compiler suite on a Rocks 5.4 HPC cluster. > I use the options -m64 and -fdefault-integer-8 > > > > Cheers, > > --Jim > > > > > > > > On Wed, Oct 30, 2013 at 7:36 PM, Martin Siegert <sieg...@sfu.ca> wrote: > > Hi Jim, > > > > I have quite a bit experience with compiling openmpi for dirac. > > Here is what I use to configure openmpi: > > > > ./configure --prefix=$instdir \ > > --disable-silent-rules \ > > --enable-mpirun-prefix-by-default \ > > --with-threads=posix \ > > --enable-cxx-exceptions \ > > --with-tm=$torquedir \ > > --with-wrapper-ldflags="-Wl,-rpath,${instdir}/lib" \ > > --with-openib \ > > --with-hwloc=$hwlocdir \ > > CC=gcc \ > > CXX=g++ \ > > FC="$FC" \ > > F77="$FC" \ > > CFLAGS="-O3" \ > > CXXFLAGS="-O3" \ > > FFLAGS="-O3 $I8FLAG" \ > > FCFLAGS="-O3 $I8FLAG" > > > > You need to set FC to either ifort or gfortran (those are the two > compilers > > that I have used) and set I8FLAG to -fdefault-integer-8 for gfortran or > > -i8 for ifort. > > Set torquedir to the directory where torque is installed ($torquedir/lib > > must contain libtorque.so), if you are running jobs under torque; > otherwise > > remove the --with-tm=... line. > > Set hwlocdir to the directory where you have hwloc installed. You many > not > > need the -with-hwloc=... option because openmpi comes with a hwloc > version > > (I don't have experience with that because we install hwloc > independently). > > Set instdir to the directory where you what to install openmpi. > > You may or may not need the --with-openib option depending on whether > > you have an Infiniband interconnect. > > > > After configure/make/make install this so compiled version can be used > > with dirac without changing the dirac source code. > > (there is one caveat: you should make sure that all "count" variables > > in MPI calls in dirac are smaller than 2^31-1. I have run into a few > cases > > when that is not the case; this problem can be overcome by replacing > > MPI_Allreduce calls in dirac with a wrapper that calls MPI_Allreduce > > repeatedly). This is what I use to setup dirac: > > > > export PATH=$instdir/bin > > ./setup --prefix=$diracinstdir \ > > --fc=mpif90 \ > > --cc=mpicc \ > > --int64 \ > > --explicit-libs="-lmkl_intel_ilp64 -lmkl_sequential -lmkl_core" > > > > where $instdir is the directory where you installed openmpi from above. > > > > I would never use the so-compiled openmpi version for anything other > > than dirac though. I am not saying that it cannot work (at a minimum > > you need to compile Fortran programs with the appropriate I8FLAG), > > but it is an unnecessary complication: I have not encountered a piece > > of software other than dirac that requires this. > > > > Cheers, > > Martin > > > > -- > > Martin Siegert > > Head, Research Computing > > WestGrid/ComputeCanada Site Lead > > Simon Fraser University > > Burnaby, British Columbia > > Canada > > > > On Wed, Oct 30, 2013 at 06:00:56PM -0500, Jim Parker wrote: > > > > > > Jeff, > > > Here's what I know: > > > 1. Checked FAQs. Done > > > 2. Version 1.6.5 > > > 3. config.log file has been removed by the sysadmin... > > > 4. ompi_info -a from head node is in attached as headnode.out > > > 5. N/A > > > 6. compute node info in attached as compute-x-yy.out > > > 7. As discussed, local variables are being overwritten after calls > to > > > MPI_RECV from Fortran code > > > 8. ifconfig output from head node and computes listed as > *-ifconfig.out > > > Cheers, > > > --Jim > > > > > > On Wed, Oct 30, 2013 at 5:29 PM, Jeff Squyres (jsquyres) > > > <[1]jsquy...@cisco.com> wrote: > > > > > > Can you send the information listed here: > > > [2]http://www.open-mpi.org/community/help/ > > > > > > On Oct 30, 2013, at 6:22 PM, Jim Parker <[3] > jimparker96...@gmail.com> > > > wrote: > > > > Jeff and Ralph, > > > > Ok, I downshifted to a helloWorld example (attached), bottom > line > > > after I hit the MPI_Recv call, my local variable (rank) gets borked. > > > > > > > > I have compiled with -m64 -fdefault-integer-8 and even have > assigned > > > kind=8 to the integers (which would be the preferred method in my > case) > > > > > > > > Your help is appreciated. > > > > > > > > Cheers, > > > > --Jim > > > > > > > > > > > > > > > > On Wed, Oct 30, 2013 at 4:49 PM, Jeff Squyres (jsquyres) > > > <[4]jsquy...@cisco.com> wrote: > > > > On Oct 30, 2013, at 4:35 PM, Jim Parker <[5] > jimparker96...@gmail.com> > > > wrote: > > > > > > > > > I have recently built a cluster that uses the 64-bit indexing > > > feature of OpenMPI following the directions at > > > > > > > > [6] > http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_fo > > > r_64-bit_integers > > > > > > > > That should be correct (i.e., passing -i8 in FFLAGS and FCFLAGS > for > > > OMPI 1.6.x). > > > > > > > > > My question is what are the new prototypes for the MPI calls ? > > > > > specifically > > > > > MPI_RECV > > > > > MPI_Allgathterv > > > > > > > > They're the same as they've always been. > > > > > > > > The magic is that the -i8 flag tells the compiler "make all > Fortran > > > INTEGERs be 8 bytes, not (the default) 4." So Ralph's answer was > > > correct in that all the MPI parameters are INTEGERs -- but you can > tell > > > the compiler that all INTEGERs are 8 bytes, not 4, and therefore get > > > "large" integers. > > > > > > > > Note that this means that you need to compile your application > with > > > -i8, too. That will make *your* INTEGERs also be 8 bytes, and then > > > you'll match what Open MPI is doing. > > > > > > > > > I'm curious because some off my local variables get killed (set > to > > > null) upon my first call to MPI_RECV. Typically, this is due (in > > > Fortran) to someone not setting the 'status' variable to an > appropriate > > > array size. > > > > > > > > If you didn't compile your application with -i8, this could well > be > > > because your application is treating INTEGERs as 4 bytes, but OMPI > is > > > treating INTEGERs as 8 bytes. Nothing good can come from that. > > > > > > > > If you *did* compile your application with -i8 and you're seeing > this > > > kind of wonkyness, we should dig deeper and see what's going on. > > > > > > > > > My review of mpif.h and mpi.h seem to indicate that the > functions > > > are defined as C int types and therefore , I assume, the coercion > > > during the compile makes the library support 64-bit indexing. ie. > int > > > -> long int > > > > > > > > FWIW: We actually define a type MPI_Fint; its actual type is > > > determined by configure (int or long int, IIRC). When your Fortran > > > code calls C, we use the MPI_Fint type for parameters, and so it > will > > > be either a 4 or 8 byte integer type. > > > > > > > > -- > > > > Jeff Squyres > > > > [7]jsquy...@cisco.com > > > > For corporate legal information go to: > > > [8]http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > > > _______________________________________________ > > > > users mailing list > > > > [9]us...@open-mpi.org > > > > [10]http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > > > > > > > <mpi-test-64bit.tar.bz2>____________________________________________ > > > ___ > > > > > > > users mailing list > > > > [11]us...@open-mpi.org > > > > [12]http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > > > Jeff Squyres > > > [13]jsquy...@cisco.com > > > For corporate legal information go to: > > > [14]http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > > > users mailing list > > > [15]us...@open-mpi.org > > > [16]http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > References > > > > > > 1. mailto:jsquy...@cisco.com > > > 2. http://www.open-mpi.org/community/help/ > > > 3. mailto:jimparker96...@gmail.com > > > 4. mailto:jsquy...@cisco.com > > > 5. mailto:jimparker96...@gmail.com > > > 6. > http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_for_64-bit_integers > > > 7. mailto:jsquy...@cisco.com > > > 8. http://www.cisco.com/web/about/doing_business/legal/cri/ > > > 9. mailto:us...@open-mpi.org > > > 10. http://www.open-mpi.org/mailman/listinfo.cgi/users > > > 11. mailto:us...@open-mpi.org > > > 12. http://www.open-mpi.org/mailman/listinfo.cgi/users > > > 13. mailto:jsquy...@cisco.com > > > 14. http://www.cisco.com/web/about/doing_business/legal/cri/ > > > 15. mailto:us...@open-mpi.org > > > 16. http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > <openmpi-1.6.5.config.tar.gz>_______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >