Re: [OMPI users] Prototypes for Fortran MPI_ commands using 64-bit indexing

2013-11-01 Thread Jeff Squyres (jsquyres)
On Oct 31, 2013, at 6:03 PM, Jeff Hammond  wrote:

> Why not just make your first level internal API equivalent to the MPI
> public API except for s/int/size_t/g and have the Fortran bindings
> drop directly into that?  Going through the C int-erface seems like a
> recipe for endless pain...


The design decision was made a long time ago to have the Fortran bindings call 
the C bindings so that we only had to have all the MPI API error checking code 
in one place (e.g., bad arguments and all that).

We *probably* could skip the C bindings:

- The Fortran bindings are in the middle of a (long term) revamp to be 
completely generated (vs. hand-coded).  This effort will take a while to 
complete, but will happen eventually.  The point here is that the generated 
code could certainly skip calling the C bindings (although it certainly is 
easier to call the C bindings -- that makes it more formulaic to generate).

- Not all the back-end APIs understand "large" integers.  For example, back-end 
MPI_INFO API calls only handle int, and would need to be updated.  The only 
point here is that there's more to do than just calling the back-end APIs -- 
even though the message-passing APIs use large integers internally, the 
non-sexy/non-message-passing stuff doesn't.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] SIGSEGV in opal_hwlock152_hwlock_bitmap_or.A // Bug in 'hwlock" ?

2013-11-01 Thread Jeff Squyres (jsquyres)
Hey Paul --

I'm going to move this over to the hwloc users list; let's see if we can get 
this issue addresses over there.  I already noticed an oddity in the XML file 
you sent.


On Oct 31, 2013, at 1:28 PM, Paul Kapinos  wrote:

> Hello all,
> 
> using 1.7.x (1.7.2 and 1.7.3 tested), we get SIGSEGV from somewhere in-deepth 
> of 'hwlock' library - see the attached screenshot.
> 
> Because the error is strongly aligned to just one single node, which in turn 
> is kinda special one (see output of 'lstopo -'), it smells like an error in 
> the 'hwlock' library.
> 
> Is there a way to disable hwlock or to debug it in somehow way?
> (besides to build a debug version of hwlock and OpenMPI)
> 
> Best
> 
> Paul
> 
> 
> 
> 
> 
> 
> 
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Prototypes for Fortran MPI_ commands using 64-bit indexing

2013-11-01 Thread Jim Parker
@Jeff,

 Well, it may have been "just for giggles", but it Worked!! My helloWorld
program ran as expected.  My original code ran through the initialization
parts without overwriting data.  It will take a few days to finish
computation and analysis to ensure it ran as expected.  I'll report back
when I get done.

It looks like a good Friday!

Cheers,
--Jim

On Thu, Oct 31, 2013 at 4:06 PM, Jeff Squyres (jsquyres)  wrote:

> For giggles, try using MPI_STATUS_IGNORE (assuming you don't need to look
> at the status at all).  See if that works for you.
>
> Meaning: I wonder if we're computing the status size for Fortran
> incorrectly in the -i8 case...
>
>
> On Oct 31, 2013, at 1:58 PM, Jim Parker  wrote:
>
> > Some additional info that may jog some solutions.  Calls to MPI_SEND do
> not cause memory corruption.  Only calls to MPI_RECV.  Since the main
> difference is the fact that MPI_RECV needs a "status" array and SEND does
> not, seems to indicate to me that something is wrong with status.
> >
> > Also, I can run a C version of the helloWorld program with no errors.
>  However, int types are only 4-byte.  To send 8byte integers, I define
> tempInt as long int and pass MPI_LONG as a type.
> >
> > @Jeff,
> >   I got a copy of the openmpi conf.log.  See attached.
> >
> > Cheers,
> > --Jim
> >
> > On Wed, Oct 30, 2013 at 10:55 PM, Jim Parker 
> wrote:
> > Ok, all, where to begin...
> >
> > Perhaps I should start with the most pressing issue for me.  I need
> 64-bit indexing
> >
> > @Martin,
> >you indicated that even if I get this up and running, the MPI library
> still uses signed 32-bit ints to count (your term), or index (my term) the
> recvbuffer lengths.  More concretely,
> > in a call to MPI_Allgatherv( buffer, count, MPI_Integer, recvbuf,
> recv-count, displ, MPI_integer, MPI_COMM_WORLD, status, mpierr): count,
> recvcounts, and displs must be  32-bit integers, not 64-bit.  Actually, all
> I need is displs to hold 64-bit values...
> > If this is true, then compiling OpenMPI this way is not a solution.
>  I'll have to restructure my code to collect 31-bit chunks...
> > Not that it matters, but I'm not using DIRAC, but a custom code to
> compute circuit analyses.
> >
> > @Jeff,
> >   Interesting, your runtime behavior has a different error than mine.
>  You have problems with the passed variable tempInt, which would make sense
> for the reasons you gave.  However, my problem involves the fact that the
> local variable "rank" gets overwritten by a memory corruption after
> MPI_RECV is called.
> >
> > Re: config.log. I will try to have the admin guy recompile tomorrow and
> see if I can get the log for you.
> >
> > BTW, I'm using the gcc 4.7.2 compiler suite on a Rocks 5.4 HPC cluster.
>  I use the options -m64 and -fdefault-integer-8
> >
> > Cheers,
> > --Jim
> >
> >
> >
> > On Wed, Oct 30, 2013 at 7:36 PM, Martin Siegert  wrote:
> > Hi Jim,
> >
> > I have quite a bit experience with compiling openmpi for dirac.
> > Here is what I use to configure openmpi:
> >
> > ./configure --prefix=$instdir \
> > --disable-silent-rules \
> > --enable-mpirun-prefix-by-default \
> > --with-threads=posix \
> > --enable-cxx-exceptions \
> > --with-tm=$torquedir \
> > --with-wrapper-ldflags="-Wl,-rpath,${instdir}/lib" \
> > --with-openib \
> > --with-hwloc=$hwlocdir \
> > CC=gcc \
> > CXX=g++ \
> > FC="$FC" \
> > F77="$FC" \
> > CFLAGS="-O3" \
> > CXXFLAGS="-O3" \
> > FFLAGS="-O3 $I8FLAG" \
> > FCFLAGS="-O3 $I8FLAG"
> >
> > You need to set FC to either ifort or gfortran (those are the two
> compilers
> > that I have used) and set I8FLAG to -fdefault-integer-8 for gfortran or
> > -i8 for ifort.
> > Set torquedir to the directory where torque is installed ($torquedir/lib
> > must contain libtorque.so), if you are running jobs under torque;
> otherwise
> > remove the --with-tm=... line.
> > Set hwlocdir to the directory where you have hwloc installed. You many
> not
> > need the -with-hwloc=... option because openmpi comes with a hwloc
> version
> > (I don't have experience with that because we install hwloc
> independently).
> > Set instdir to the directory where you what to install openmpi.
> > You may or may not need the --with-openib option depending on whether
> > you have an Infiniband interconnect.
> >
> > After configure/make/make install this so compiled version can be used
> > with dirac without changing the dirac source code.
> > (there is one caveat: you should make sure that all "count" variables
> > in MPI calls in dirac are smaller than 2^31-1. I have run into a few
> cases
> > when that is not the case; this problem can be overcome by replacing
> > MPI_Allreduce calls in dirac with a wrapper that calls MPI_Allreduce
> > repeatedly). This is what I use to setup dirac:
> >
> > export PATH=$instdir/bin
> > ./setup --prefix=