Some additional info that may jog some solutions.  Calls to MPI_SEND do not
cause memory corruption.  Only calls to MPI_RECV.  Since the main
difference is the fact that MPI_RECV needs a "status" array and SEND does
not, seems to indicate to me that something is wrong with status.

Also, I can run a C version of the helloWorld program with no errors.
However, int types are only 4-byte.  To send 8byte integers, I define
tempInt as long int and pass MPI_LONG as a type.

@Jeff,
  I got a copy of the openmpi conf.log.  See attached.

Cheers,
--Jim

On Wed, Oct 30, 2013 at 10:55 PM, Jim Parker <jimparker96...@gmail.com>wrote:

>        Ok, all, where to begin...
>
> Perhaps I should start with the most pressing issue for me.  I need 64-bit
> indexing
>
> @Martin,
>    you indicated that even if I get this up and running, the MPI library
> still uses signed 32-bit ints to count (your term), or index (my term) the
> recvbuffer lengths.  More concretely,
> in a call to MPI_Allgatherv( buffer, count, MPI_Integer, recvbuf,
> recv-count, displ, MPI_integer, MPI_COMM_WORLD, status, mpierr): count,
> recvcounts, and displs must be  32-bit integers, not 64-bit.  Actually, all
> I need is displs to hold 64-bit values...
> If this is true, then compiling OpenMPI this way is not a solution.  I'll
> have to restructure my code to collect 31-bit chunks...
> Not that it matters, but I'm not using DIRAC, but a custom code to compute
> circuit analyses.
>
> @Jeff,
>   Interesting, your runtime behavior has a different error than mine.  You
> have problems with the passed variable tempInt, which would make sense for
> the reasons you gave.  However, my problem involves the fact that the local
> variable "rank" gets overwritten by a memory corruption after MPI_RECV is
> called.
>
> Re: config.log. I will try to have the admin guy recompile tomorrow and
> see if I can get the log for you.
>
> BTW, I'm using the gcc 4.7.2 compiler suite on a Rocks 5.4 HPC cluster.  I
> use the options -m64 and -fdefault-integer-8
>
> Cheers,
> --Jim
>
>
>
> On Wed, Oct 30, 2013 at 7:36 PM, Martin Siegert <sieg...@sfu.ca> wrote:
>
>> Hi Jim,
>>
>> I have quite a bit experience with compiling openmpi for dirac.
>> Here is what I use to configure openmpi:
>>
>> ./configure --prefix=$instdir \
>>             --disable-silent-rules \
>>             --enable-mpirun-prefix-by-default \
>>             --with-threads=posix \
>>             --enable-cxx-exceptions \
>>             --with-tm=$torquedir \
>>             --with-wrapper-ldflags="-Wl,-rpath,${instdir}/lib" \
>>             --with-openib \
>>             --with-hwloc=$hwlocdir \
>>             CC=gcc \
>>             CXX=g++ \
>>             FC="$FC" \
>>             F77="$FC" \
>>             CFLAGS="-O3" \
>>             CXXFLAGS="-O3" \
>>             FFLAGS="-O3 $I8FLAG" \
>>             FCFLAGS="-O3 $I8FLAG"
>>
>> You need to set FC to either ifort or gfortran (those are the two
>> compilers
>> that I have used) and set I8FLAG to -fdefault-integer-8 for gfortran or
>> -i8 for ifort.
>> Set torquedir to the directory where torque is installed ($torquedir/lib
>> must contain libtorque.so), if you are running jobs under torque;
>> otherwise
>> remove the --with-tm=... line.
>> Set hwlocdir to the directory where you have hwloc installed. You many not
>> need the -with-hwloc=... option because openmpi comes with a hwloc version
>> (I don't have experience with that because we install hwloc
>> independently).
>> Set instdir to the directory where you what to install openmpi.
>> You may or may not need the --with-openib option depending on whether
>> you have an Infiniband interconnect.
>>
>> After configure/make/make install this so compiled version can be used
>> with dirac without changing the dirac source code.
>> (there is one caveat: you should make sure that all "count" variables
>> in MPI calls in dirac are smaller than 2^31-1. I have run into a few cases
>> when that is not the case; this problem can be overcome by replacing
>> MPI_Allreduce calls in dirac with a wrapper that calls MPI_Allreduce
>> repeatedly). This is what I use to setup dirac:
>>
>> export PATH=$instdir/bin
>> ./setup --prefix=$diracinstdir \
>>         --fc=mpif90 \
>>         --cc=mpicc \
>>         --int64 \
>>         --explicit-libs="-lmkl_intel_ilp64 -lmkl_sequential -lmkl_core"
>>
>> where $instdir is the directory where you installed openmpi from above.
>>
>> I would never use the so-compiled openmpi version for anything other
>> than dirac though. I am not saying that it cannot work (at a minimum
>> you need to compile Fortran programs with the appropriate I8FLAG),
>> but it is an unnecessary complication: I have not encountered a piece
>> of software other than dirac that requires this.
>>
>> Cheers,
>> Martin
>>
>> --
>> Martin Siegert
>> Head, Research Computing
>> WestGrid/ComputeCanada Site Lead
>> Simon Fraser University
>> Burnaby, British Columbia
>> Canada
>>
>> On Wed, Oct 30, 2013 at 06:00:56PM -0500, Jim Parker wrote:
>> >
>> >    Jeff,
>> >      Here's what I know:
>> >    1.  Checked FAQs.  Done
>> >    2.  Version 1.6.5
>> >    3. config.log file has been removed by the sysadmin...
>> >    4. ompi_info -a from head node is in attached as headnode.out
>> >    5. N/A
>> >    6. compute node info in attached as compute-x-yy.out
>> >    7. As discussed, local variables are being overwritten after calls to
>> >    MPI_RECV from Fortran code
>> >    8. ifconfig output from head node and computes listed as
>> *-ifconfig.out
>> >    Cheers,
>> >    --Jim
>> >
>> >    On Wed, Oct 30, 2013 at 5:29 PM, Jeff Squyres (jsquyres)
>> >    <[1]jsquy...@cisco.com> wrote:
>> >
>> >      Can you send the information listed here:
>> >          [2]http://www.open-mpi.org/community/help/
>> >
>> >    On Oct 30, 2013, at 6:22 PM, Jim Parker <[3]jimparker96...@gmail.com
>> >
>> >    wrote:
>> >    > Jeff and Ralph,
>> >    >   Ok, I downshifted to a helloWorld example (attached), bottom line
>> >    after I hit the MPI_Recv call, my local variable (rank) gets borked.
>> >    >
>> >    > I have compiled with -m64 -fdefault-integer-8 and even have
>> assigned
>> >    kind=8 to the integers (which would be the preferred method in my
>> case)
>> >    >
>> >    > Your help is appreciated.
>> >    >
>> >    > Cheers,
>> >    > --Jim
>> >    >
>> >    >
>> >    >
>> >    > On Wed, Oct 30, 2013 at 4:49 PM, Jeff Squyres (jsquyres)
>> >    <[4]jsquy...@cisco.com> wrote:
>> >    > On Oct 30, 2013, at 4:35 PM, Jim Parker <[5]
>> jimparker96...@gmail.com>
>> >    wrote:
>> >    >
>> >    > >   I have recently built a cluster that uses the 64-bit indexing
>> >    feature of OpenMPI following the directions at
>> >    > >
>> >    [6]
>> http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_fo
>>  >    r_64-bit_integers
>> >    >
>> >    > That should be correct (i.e., passing -i8 in FFLAGS and FCFLAGS for
>> >    OMPI 1.6.x).
>> >    >
>> >    > > My question is what are the new prototypes for the MPI calls ?
>> >    > > specifically
>> >    > > MPI_RECV
>> >    > > MPI_Allgathterv
>> >    >
>> >    > They're the same as they've always been.
>> >    >
>> >    > The magic is that the -i8 flag tells the compiler "make all Fortran
>> >    INTEGERs be 8 bytes, not (the default) 4."  So Ralph's answer was
>> >    correct in that all the MPI parameters are INTEGERs -- but you can
>> tell
>> >    the compiler that all INTEGERs are 8 bytes, not 4, and therefore get
>> >    "large" integers.
>> >    >
>> >    > Note that this means that you need to compile your application with
>> >    -i8, too.  That will make *your* INTEGERs also be 8 bytes, and then
>> >    you'll match what Open MPI is doing.
>> >    >
>> >    > > I'm curious because some off my local variables get killed (set
>> to
>> >    null) upon my first call to MPI_RECV.  Typically, this is due (in
>> >    Fortran) to someone not setting the 'status' variable to an
>> appropriate
>> >    array size.
>> >    >
>> >    > If you didn't compile your application with -i8, this could well be
>> >    because your application is treating INTEGERs as 4 bytes, but OMPI is
>> >    treating INTEGERs as 8 bytes.  Nothing good can come from that.
>> >    >
>> >    > If you *did* compile your application with -i8 and you're seeing
>> this
>> >    kind of wonkyness, we should dig deeper and see what's going on.
>> >    >
>> >    > > My review of mpif.h and mpi.h seem to indicate that the functions
>> >    are defined as C int types and therefore , I assume, the coercion
>> >    during the compile makes the library support 64-bit indexing.  ie.
>> int
>> >    -> long int
>> >    >
>> >    > FWIW: We actually define a type MPI_Fint; its actual type is
>> >    determined by configure (int or long int, IIRC).  When your Fortran
>> >    code calls C, we use the MPI_Fint type for parameters, and so it will
>> >    be either a 4 or 8 byte integer type.
>> >    >
>> >    > --
>> >    > Jeff Squyres
>> >    > [7]jsquy...@cisco.com
>> >    > For corporate legal information go to:
>> >    [8]http://www.cisco.com/web/about/doing_business/legal/cri/
>> >    >
>> >    > _______________________________________________
>> >    > users mailing list
>> >    > [9]us...@open-mpi.org
>> >    > [10]http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >    >
>> >
>> >      >
>> >
>>  <mpi-test-64bit.tar.bz2>____________________________________________
>> >      ___
>> >
>> >    > users mailing list
>> >    > [11]us...@open-mpi.org
>> >    > [12]http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >    --
>> >    Jeff Squyres
>> >    [13]jsquy...@cisco.com
>> >    For corporate legal information go to:
>> >    [14]http://www.cisco.com/web/about/doing_business/legal/cri/
>> >    _______________________________________________
>> >    users mailing list
>> >    [15]us...@open-mpi.org
>> >    [16]http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > References
>> >
>> >    1. mailto:jsquy...@cisco.com
>> >    2. http://www.open-mpi.org/community/help/
>> >    3. mailto:jimparker96...@gmail.com
>> >    4. mailto:jsquy...@cisco.com
>> >    5. mailto:jimparker96...@gmail.com
>> >    6.
>> http://wiki.chem.vu.nl/dirac/index.php/How_to_build_MPI_libraries_for_64-bit_integers
>> >    7. mailto:jsquy...@cisco.com
>> >    8. http://www.cisco.com/web/about/doing_business/legal/cri/
>> >    9. mailto:us...@open-mpi.org
>> >   10. http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >   11. mailto:us...@open-mpi.org
>> >   12. http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >   13. mailto:jsquy...@cisco.com
>> >   14. http://www.cisco.com/web/about/doing_business/legal/cri/
>> >   15. mailto:us...@open-mpi.org
>> >   16. http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>

Attachment: openmpi-1.6.5.config.tar.gz
Description: GNU Zip compressed data

Reply via email to