Jim --

This has been bugging me for a while.  I finally got to check today: it looks 
like we compute MPI_STATUS_SIZE correctly in both the 32 and 64 bit cases.

That is, MPI_STATUS_SIZE exactly reflects the size of the C MPI_Status (4 int's 
and a size_t, or 4 * 4 + 8 = 24 bytes), regardless of whether Fortran is 32 or 
64 bits.

There are two implications here:

1. OMPI should not be overwriting your status array if you're using 
MPI_STATUS_SIZE as its length.  Meaning: in the 32 bit case, MPI_STATUS_SIZE=6, 
and in the 64 bit case, MPI_STATUS_SIZE=3.

2. In the 64 bit case, you'll have a difficult time extracting the MPI status 
values from the 8-byte INTEGERs in the status array in Fortran (because the 
first 2 of 3 each really be 2 4-byte integers).

So while #2 is a little weird (and probably should be fixed), a properly-sized 
MPI_STATUS_SIZE array shouldn't be causing any problems.  So I'm still a little 
befuddled as to why you're seeing an error.  :-\




On Nov 1, 2013, at 4:51 PM, Jim Parker <jimparker96...@gmail.com> wrote:

> @Jeff,
>  
>  Well, it may have been "just for giggles", but it Worked!! My helloWorld 
> program ran as expected.  My original code ran through the initialization 
> parts without overwriting data.  It will take a few days to finish 
> computation and analysis to ensure it ran as expected.  I'll report back when 
> I get done.
>  
> It looks like a good Friday!
>  
> Cheers,
> --Jim
> 
> On Thu, Oct 31, 2013 at 4:06 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> For giggles, try using MPI_STATUS_IGNORE (assuming you don't need to look at 
> the status at all).  See if that works for you.
> 
> Meaning: I wonder if we're computing the status size for Fortran incorrectly 
> in the -i8 case...
> 
> 
> On Oct 31, 2013, at 1:58 PM, Jim Parker <jimparker96...@gmail.com> wrote:
> 
> > Some additional info that may jog some solutions.  Calls to MPI_SEND do not 
> > cause memory corruption.  Only calls to MPI_RECV.  Since the main 
> > difference is the fact that MPI_RECV needs a "status" array and SEND does 
> > not, seems to indicate to me that something is wrong with status.
> >
> > Also, I can run a C version of the helloWorld program with no errors.  
> > However, int types are only 4-byte.  To send 8byte integers, I define 
> > tempInt as long int and pass MPI_LONG as a type.
> >
> > @Jeff,
> >   I got a copy of the openmpi conf.log.  See attached.
> >
> > Cheers,
> > --Jim
> >
> > On Wed, Oct 30, 2013 at 10:55 PM, Jim Parker <jimparker96...@gmail.com> 
> > wrote:
> > Ok, all, where to begin...
> >
> > Perhaps I should start with the most pressing issue for me.  I need 64-bit 
> > indexing
> >
> > @Martin,
> >    you indicated that even if I get this up and running, the MPI library 
> > still uses signed 32-bit ints to count (your term), or index (my term) the 
> > recvbuffer lengths.  More concretely,
> > in a call to MPI_Allgatherv( buffer, count, MPI_Integer, recvbuf, 
> > recv-count, displ, MPI_integer, MPI_COMM_WORLD, status, mpierr): count, 
> > recvcounts, and displs must be  32-bit integers, not 64-bit.  Actually, all 
> > I need is displs to hold 64-bit values...
> > If this is true, then compiling OpenMPI this way is not a solution.  I'll 
> > have to restructure my code to collect 31-bit chunks...
> > Not that it matters, but I'm not using DIRAC, but a custom code to compute 
> > circuit analyses.
> >
> > @Jeff,
> >   Interesting, your runtime behavior has a different error than mine.  You 
> > have problems with the passed variable tempInt, which would make sense for 
> > the reasons you gave.  However, my problem involves the fact that the local 
> > variable "rank" gets overwritten by a memory corruption after MPI_RECV is 
> > called.
> >
> > Re: config.log. I will try to have the admin guy recompile tomorrow and see 
> > if I can get the log for you.
> >
> > BTW, I'm using the gcc 4.7.2 compiler suite on a Rocks 5.4 HPC cluster.  I 
> > use the options -m64 and -fdefault-integer-8
> >
> > Cheers,
> > --Jim
> >
> >
> >
> > On Wed, Oct 30, 2013 at 7:36 PM, Martin Siegert <sieg...@sfu.ca> wrote:
> > Hi Jim,
> >
> > I have quite a bit experience with compiling openmpi for dirac.
> > Here is what I use to configure openmpi:
> >
> > ./configure --prefix=$instdir \
> >             --disable-silent-rules \
> >             --enable-mpirun-prefix-by-default \
> >             --with-threads=posix \
> >             --enable-cxx-exceptions \
> >             --with-tm=$torquedir \
> >             --with-wrapper-ldflags="-Wl,-rpath,${instdir}/lib" \
> >             --with-openib \
> >             --with-hwloc=$hwlocdir \
> >             CC=gcc \
> >             CXX=g++ \
> >             FC="$FC" \
> >             F77="$FC" \
> >             CFLAGS="-O3" \
> >             CXXFLAGS="-O3" \
> >             FFLAGS="-O3 $I8FLAG" \
> >             FCFLAGS="-O3 $I8FLAG"
> >
> > You need to set FC to either ifort or gfortran (those are the two compilers
> > that I have used) and set I8FLAG to -fdefault-integer-8 for gfortran or
> > -i8 for ifort.
> > Set torquedir to the directory where torque is installed ($torquedir/lib
> > must contain libtorque.so), if you are running jobs under torque; otherwise
> > remove the --with-tm=... line.
> > Set hwlocdir to the directory where you have hwloc installed. You many not
> > need the -with-hwloc=... option because openmpi comes with a hwloc version
> > (I don't have experience with that because we install hwloc independently).
> > Set instdir to the directory where you what to install openmpi.
> > You may or may not need the --with-openib option depending on whether
> > you have an Infiniband interconnect.
> >
> > After configure/make/make install this so compiled version can be used
> > with dirac without changing the dirac source code.
> > (there is one caveat: you should make sure that all "count" variables
> > in MPI calls in dirac are smaller than 2^31-1. I have run into a few cases
> > when that is not the case; this problem can be overcome by replacing
> > MPI_Allreduce calls in dirac with a wrapper that calls MPI_Allreduce
> > repeatedly). This is what I use to setup dirac:
> >
> > export PATH=$instdir/bin
> > ./setup --prefix=$diracinstdir \
> >         --fc=mpif90 \
> >         --cc=mpicc \
> >         --int64 \
> >         --explicit-libs="-lmkl_intel_ilp64 -lmkl_sequential -lmkl_core"
> >
> > where $instdir is the directory where you installed openmpi from above.
> >
> > I would never use the so-compiled openmpi version for anything other
> > than dirac though. I am not saying that it cannot work (at a minimum
> > you need to compile Fortran programs with the appropriate I8FLAG),
> > but it is an unnecessary complication: I have not encountered a piece
> > of software other than dirac that requires this.
> >
> > Cheers,
> > Martin
> >
> > --
> > Martin Siegert
> > Head, Research Computing
> > WestGrid/ComputeCanada Site Lead
> > Simon Fraser University
> > Burnaby, British Columbia
> > Canada
> >
> > On Wed, Oct 30, 2013 at 06:00:56PM -0500, Jim Parker wrote:
> > >
> > >    Jeff,
> > >      Here's what I know:
> > >    1.  Checked FAQs.  Done
> > >    2.  Version 1.6.5
> > >    3. config.log file has been removed by the sysadmin...
> > >    4. ompi_info -a from head node is in attached as headnode.out
> > >    5. N/A
> > >    6. compute node info in attached as compute-x-yy.out
> > >    7. As discussed, local variables are being overwritten after calls to
> > >    MPI_RECV from Fortran code
> > >    8. ifconfig output from head node and computes listed as *-ifconfig.out
> > >    Cheers,
> > >    --Jim
> > >
> > >    On Wed, Oct 30, 2013 at 5:29 PM, Jeff Squyres (jsquyres)
> > >    <[1]jsquy...@cisco.com> wrote:
> > >
> > >      Can you send the information listed here:
> > >          [2]http://www.open-mpi.org/community/help/
> > >
> > >    On Oct 30, 2013, at 6:22 PM, Jim Parker <[3]jimparker96...@gmail.com>
> > >    wrote:
> > >    > Jeff and Ralph,
> > >    >   Ok, I downshifted to a helloWorld example (attached), bottom line
> > >    after I hit the MPI_Recv call, my local variable (rank) gets borked.
> > >    >
> > >    > I have compiled with -m64 -fdefault-integer-8 and even have assigned
> > >    kind=8 to the integers (which would be the preferred method in my case)
> > >    >
> > >    > Your help is appreciated.
> > >    >
> > >    > Cheers,
> > >    > --Jim
> > >    >
> > >    >
> > >    >
> > >    > On Wed, Oct 30, 2013 at 4:49 PM, Jeff Squyres (jsquyres)
> > >    <[4]jsquy...@cisco.com> wrote:
> > >    > On Oct 30, 2013, at 4:35 PM, Jim Parker <[5]jimparker96...@gmail.com>
> > >    wrote:
> > >    >
> > >    > >   I have recently built a cluster that uses the 64-bit indexing
> > >    feature of OpenMPI following the directions at
> > >    > >
> > >    [6]http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_fo
> > >    r_64-bit_integers
> > >    >
> > >    > That should be correct (i.e., passing -i8 in FFLAGS and FCFLAGS for
> > >    OMPI 1.6.x).
> > >    >
> > >    > > My question is what are the new prototypes for the MPI calls ?
> > >    > > specifically
> > >    > > MPI_RECV
> > >    > > MPI_Allgathterv
> > >    >
> > >    > They're the same as they've always been.
> > >    >
> > >    > The magic is that the -i8 flag tells the compiler "make all Fortran
> > >    INTEGERs be 8 bytes, not (the default) 4."  So Ralph's answer was
> > >    correct in that all the MPI parameters are INTEGERs -- but you can tell
> > >    the compiler that all INTEGERs are 8 bytes, not 4, and therefore get
> > >    "large" integers.
> > >    >
> > >    > Note that this means that you need to compile your application with
> > >    -i8, too.  That will make *your* INTEGERs also be 8 bytes, and then
> > >    you'll match what Open MPI is doing.
> > >    >
> > >    > > I'm curious because some off my local variables get killed (set to
> > >    null) upon my first call to MPI_RECV.  Typically, this is due (in
> > >    Fortran) to someone not setting the 'status' variable to an appropriate
> > >    array size.
> > >    >
> > >    > If you didn't compile your application with -i8, this could well be
> > >    because your application is treating INTEGERs as 4 bytes, but OMPI is
> > >    treating INTEGERs as 8 bytes.  Nothing good can come from that.
> > >    >
> > >    > If you *did* compile your application with -i8 and you're seeing this
> > >    kind of wonkyness, we should dig deeper and see what's going on.
> > >    >
> > >    > > My review of mpif.h and mpi.h seem to indicate that the functions
> > >    are defined as C int types and therefore , I assume, the coercion
> > >    during the compile makes the library support 64-bit indexing.  ie. int
> > >    -> long int
> > >    >
> > >    > FWIW: We actually define a type MPI_Fint; its actual type is
> > >    determined by configure (int or long int, IIRC).  When your Fortran
> > >    code calls C, we use the MPI_Fint type for parameters, and so it will
> > >    be either a 4 or 8 byte integer type.
> > >    >
> > >    > --
> > >    > Jeff Squyres
> > >    > [7]jsquy...@cisco.com
> > >    > For corporate legal information go to:
> > >    [8]http://www.cisco.com/web/about/doing_business/legal/cri/
> > >    >
> > >    > _______________________________________________
> > >    > users mailing list
> > >    > [9]us...@open-mpi.org
> > >    > [10]http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >    >
> > >
> > >      >
> > >      <mpi-test-64bit.tar.bz2>____________________________________________
> > >      ___
> > >
> > >    > users mailing list
> > >    > [11]us...@open-mpi.org
> > >    > [12]http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >    --
> > >    Jeff Squyres
> > >    [13]jsquy...@cisco.com
> > >    For corporate legal information go to:
> > >    [14]http://www.cisco.com/web/about/doing_business/legal/cri/
> > >    _______________________________________________
> > >    users mailing list
> > >    [15]us...@open-mpi.org
> > >    [16]http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > > References
> > >
> > >    1. mailto:jsquy...@cisco.com
> > >    2. http://www.open-mpi.org/community/help/
> > >    3. mailto:jimparker96...@gmail.com
> > >    4. mailto:jsquy...@cisco.com
> > >    5. mailto:jimparker96...@gmail.com
> > >    6. 
> > > http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_for_64-bit_integers
> > >    7. mailto:jsquy...@cisco.com
> > >    8. http://www.cisco.com/web/about/doing_business/legal/cri/
> > >    9. mailto:us...@open-mpi.org
> > >   10. http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >   11. mailto:us...@open-mpi.org
> > >   12. http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >   13. mailto:jsquy...@cisco.com
> > >   14. http://www.cisco.com/web/about/doing_business/legal/cri/
> > >   15. mailto:us...@open-mpi.org
> > >   16. http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > <openmpi-1.6.5.config.tar.gz>_______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to