We did update ROMIO at some point in there, so it is possible this is a ROMIO 
bug that we have picked up. I've asked someone to check upstream about it.


On Jan 17, 2014, at 12:02 PM, Ronald Cohen <rhco...@lbl.gov> wrote:

> Sorry, too many entries in this thread, I guess.  My general goal is to get a 
> working parallel hdf5 with openmpi on Mac OS X Mavericks.  At one point in 
> the saga I had romio disabled, which naturally doesn't work for hdf5 (which 
> is trying to read/write files in parallel).  So the hdf5 tests would of 
> course fail.    I subsequently had link errors with hdf5 because I was 
> building openmpi with --disable-static, whereas the default (and recommended) 
> option for hdf5 is to disable shared and build static.  My most recent 
> attempts were with openmpi with enable-static, enable-nodlopen.  In that 
> case, with openmpi 1.7.4rc1, hdf5 1.8.12 configured and built successfully 
> but make chek-p produced many errors in its t-mpi tests, with messages like 
> "proc 4: found data error at [2140143616+0], expect -7, got 6".   The errors 
> were reproduced by the HDF5 testing team with openmpir 1.7.4rc1, but not with 
> 1.7.3 (which I am now building).
> 
> Hopefully that is an adequate summary.
> 
> Ron
> 
> 
> 
> On Fri, Jan 17, 2014 at 11:44 AM, Jeff Squyres (jsquyres) 
> <jsquy...@cisco.com> wrote:
> Can you specify exactly which issue you're referring to?
> 
> - test failing when you had ROMIO disabled
> - test (sometimes) failing when you had ROMIO disabled
> - compiling / linking issues
> 
> ?
> 
> 
> On Jan 17, 2014, at 1:50 PM, Ronald Cohen <rhco...@lbl.gov> wrote:
> 
> > Hello Ralph and others, I just got the following back from the HDF-5 
> > support group, suggesting an ompi bug.   So I should either try 1.7.3 or a 
> > recent nightly 1.7.4.    Will likely opt for 1.7.3, but hopefully someone 
> > at openmpi can look at the problem for 1.7.4.   In short, the challenge is 
> > to get a parallel hdf5 that passes make check-p with 1.7.4.
> >
> >
> >
> >
> >
> > ------------------
> > Hi Ron,
> >
> > I had sent your message to the developer and he can reproduce the issue.
> > Here is what he says:
> >
> >  ---
> >  I replicated this on Jam with ompi 1.7.4rc1. I saw the same error he is 
> > seeing.
> >  Note that this is an un-stable release for ompi.
> >  I tried ompi 1.7.3 (feature - little more stable release). I didn't see the
> >  problems there.
> >
> >  So this is an ompi bug. He can report it to the ompi list. He can just 
> > point
> >  them to the t_mpi.c tests in our test suite in testpar/ and say it occurs 
> > with
> >  their 1.7.4 rc1.
> >  ---
> >
> > -Barbara
> >
> >
> >
> > On Fri, Jan 17, 2014 at 9:39 AM, Ronald Cohen <rhco...@lbl.gov> wrote:
> > Thanks, I've just gotten an email with some suggestions (and promise of 
> > more help) from the HDF5 support team.   I will report back here, as it may 
> > be of interest to others trying to build hdf5 on mavericks.
> >
> >
> > On Fri, Jan 17, 2014 at 9:08 AM, Ralph Castain <r...@open-mpi.org> wrote:
> > Afraid I have no idea, but hopefully someone else here with experience with 
> > HDF5 can chime in?
> >
> >
> > On Jan 17, 2014, at 9:03 AM, Ronald Cohen <rhco...@lbl.gov> wrote:
> >
> >> Still a timely response, thank you.    The particular problem I noted 
> >> hasn't recurred; for reasons I will explain shortly I had to rebuild 
> >> openmpi again, and this time Sample_mpio.c compiled and ran successfully 
> >> from the start.
> >>
> >> But now my problem is trying to get parallel HDF5 to run.  In my first 
> >> attempt to build HDF5 it failed in the load stage because of unsatisifed 
> >> externals from openmpi, and I deduced the problem was having built openmpi 
> >> with --disable-static.   So I rebuilt with --enable-static and 
> >> --disable-dlopen (emulating a successful openmpi + hdf5 combination I had 
> >> built on Snow Leopard).   Once again openmpi passed its make check's, and 
> >> as noted above the Sample_mpio.c test compiled and ran fine.   And the 
> >> parallel hdf5 configure and make steps ran successfully.   But when I ran 
> >> make check for hdf5, the serial tests passed but none of the parallel 
> >> tests did.  Over a million test failures!  Error messages like:
> >>
> >> Proc 0: *** MPIO File size range test...
> >> --------------------------------
> >> MPI_Offset is signed 8 bytes integeral type
> >> MPIO GB file write test MPItest.h5
> >> MPIO GB file write test MPItest.h5
> >> MPIO GB file write test MPItest.h5
> >> MPIO GB file write test MPItest.h5
> >> MPIO GB file write test MPItest.h5
> >> MPIO GB file write test MPItest.h5
> >> MPIO GB file read test MPItest.h5
> >> MPIO GB file read test MPItest.h5
> >> MPIO GB file read test MPItest.h5
> >> MPIO GB file read test MPItest.h5
> >> proc 3: found data error at [2141192192+0], expect -6, got 5
> >> proc 3: found data error at [2141192192+1], expect -6, got 5
> >>
> >> And -- the specific errors reported, which processor, which location, and 
> >> the total number of errors changes if I rerun make check.
> >>
> >> I've sent configure, make and make check logs to the HDF5 help desk but 
> >> haven't gotten a response.
> >>
> >> I am now configuring openmpi (still 1.7.4rc1) with:
> >>
> >> ./configure --prefix=/usr/local/openmpi CC=gcc CXX=g++ FC=gfortran 
> >> F77=gfortran --enable-static --with-pic --disable-dlopen 
> >> --enable-mpirun-prefix-by-default
> >>
> >> and configuring HDF5 (version 1.8.12) with:
> >>
> >> configure --prefix=/usr/local/hdf5/par CC=mpicc CFLAGS=-fPIC FC=mpif90 
> >> FCFLAGS=-fPIC CXX=mpicxx CXXFLAGS=-fPIC --enable-parallel --enable-fortran
> >>
> >> This is the combination that worked for me with Snow Leopard (though it 
> >> was then earlier versions of both openmpi and hdf5.)
> >>
> >> If it matters, the gcc is the stock one with Mavericks' XCode, and 
> >> gfortran is 4.9.0.
> >>
> >> (I just noticed that the mpi fortran wrapper is now mpifort, but I also 
> >> see that mpif90 is still there and is a just link to mpifort.)
> >>
> >> Any suggestions?
> >>
> >>
> >> On Fri, Jan 17, 2014 at 8:14 AM, Ralph Castain <r...@open-mpi.org> wrote:
> >> sorry for delayed response - just getting back from travel. I don't know 
> >> why you would get that behavior other than a race condition. Afraid that 
> >> code path is foreign to me, but perhaps one of the folks in the MPI-IO 
> >> area can respond
> >>
> >>
> >> On Jan 15, 2014, at 4:26 PM, Ronald Cohen <rhco...@lbl.gov> wrote:
> >>
> >>> Update: I reconfigured with enable_io_romio=yes, and this time -- mostly 
> >>> -- the test using Sample_mpio.c  passes.   Oddly the very first time I 
> >>> tried I got errors:
> >>>
> >>> % mpirun -np 2 sampleio
> >>> Proc 1: hostname=Ron-Cohen-MBP.local
> >>> Testing simple C MPIO program with 2 processes accessing file 
> >>> ./mpitest.data
> >>>     (Filename can be specified via program argument)
> >>> Proc 0: hostname=Ron-Cohen-MBP.local
> >>> Proc 1: read data[0:1] got 0, expect 1
> >>> Proc 1: read data[0:2] got 0, expect 2
> >>> Proc 1: read data[0:3] got 0, expect 3
> >>> Proc 1: read data[0:4] got 0, expect 4
> >>> Proc 1: read data[0:5] got 0, expect 5
> >>> Proc 1: read data[0:6] got 0, expect 6
> >>> Proc 1: read data[0:7] got 0, expect 7
> >>> Proc 1: read data[0:8] got 0, expect 8
> >>> Proc 1: read data[0:9] got 0, expect 9
> >>> Proc 1: read data[1:0] got 0, expect 10
> >>> Proc 1: read data[1:1] got 0, expect 11
> >>> Proc 1: read data[1:2] got 0, expect 12
> >>> Proc 1: read data[1:3] got 0, expect 13
> >>> Proc 1: read data[1:4] got 0, expect 14
> >>> Proc 1: read data[1:5] got 0, expect 15
> >>> Proc 1: read data[1:6] got 0, expect 16
> >>> Proc 1: read data[1:7] got 0, expect 17
> >>> Proc 1: read data[1:8] got 0, expect 18
> >>> Proc 1: read data[1:9] got 0, expect 19
> >>> --------------------------------------------------------------------------
> >>> MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
> >>> with errorcode 1.
> >>>
> >>> But when I reran the same mpirun command, the test was successful.   And 
> >>> deleting the executable and recompiling and then again running the same 
> >>> mpirun command, the test was successful.   Can someone explain that?
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Jan 15, 2014 at 1:16 PM, Ronald Cohen <rhco...@lbl.gov> wrote:
> >>> Aha.   I guess I didn't know what the io-romio option does.   If you look 
> >>> at my config.log you will see my configure line included 
> >>> --disable-io-romio.    Guess I should change --disable to --enable.
> >>>
> >>> You seem to imply that the nightly build is stable enough that I should 
> >>> probably switch to that rather than 1.7.4rc1.   Am I reading between the 
> >>> lines correctly?
> >>>
> >>>
> >>>
> >>> On Wed, Jan 15, 2014 at 10:56 AM, Ralph Castain <r...@open-mpi.org> wrote:
> >>> Oh, a word of caution on those config params - you might need to check to 
> >>> ensure I don't disable romio in them. I don't normally build it as I 
> >>> don't use it. Since that is what you are trying to use, just change the 
> >>> "no" to "yes" (or delete that line altogether) and it will build.
> >>>
> >>>
> >>>
> >>> On Wed, Jan 15, 2014 at 10:53 AM, Ralph Castain <r...@open-mpi.org> wrote:
> >>> You can find my configure options in the OMPI distribution at 
> >>> contrib/platform/intel/bend/mac. You are welcome to use them - just 
> >>> configure --with-platform=intel/bend/mac
> >>>
> >>> I work on the developer's trunk, of course, but also run the head of the 
> >>> 1.7.4 branch (essentially the nightly tarball) on a fairly regular basis.
> >>>
> >>> As for the opal_bitmap test: it wouldn't surprise me if that one was 
> >>> stale. I can check on it later tonight, but I'd suspect that the test is 
> >>> bad as we use that class in the code base and haven't seen an issue.
> >>>
> >>>
> >>>
> >>> On Wed, Jan 15, 2014 at 10:49 AM, Ronald Cohen <rhco...@lbl.gov> wrote:
> >>> Ralph,
> >>>
> >>> I just sent out another post with the c file attached.
> >>>
> >>> If you can get that to work, and even if you can't can you tell me what 
> >>> configure options you use, and what version of open-mpi?   Thanks.
> >>>
> >>> Ron
> >>>
> >>>
> >>> On Wed, Jan 15, 2014 at 10:36 AM, Ralph Castain <r...@open-mpi.org> wrote:
> >>> BTW: could you send me your sample test code?
> >>>
> >>>
> >>> On Wed, Jan 15, 2014 at 10:34 AM, Ralph Castain <r...@open-mpi.org> wrote:
> >>> I regularly build on Mavericks and run without problem, though I haven't 
> >>> tried a parallel IO app. I'll give yours a try later, when I get back to 
> >>> my Mac.
> >>>
> >>>
> >>>
> >>> On Wed, Jan 15, 2014 at 10:04 AM, Ronald Cohen <rhco...@lbl.gov> wrote:
> >>> I have been struggling trying to get a usable build of openmpi on Mac OSX 
> >>> Mavericks (10.9.1).  I can get openmpi to configure and build without 
> >>> error, but have problems after that which depend on the openmpi version.
> >>>
> >>> With 1.6.5, make check fails the opal_datatype_test, ddt_test, and 
> >>> ddt_raw tests.  The various atomic_* tests pass.    See checklogs_1.6.5, 
> >>> attached as a .gz file.
> >>>
> >>> Following suggestions from openmpi discussions I tried openmpi version 
> >>> 1.7.4rc1.  In this case make check indicates all tests passed.  But when 
> >>> I proceeded to try to build a parallel code (parallel HDF5) it failed.  
> >>> Following an email exchange with the HDF5 support people, they suggested 
> >>> I try to compile and run the attached bit of simple code Sample_mpio.c 
> >>> (which they supplied) which does not use any HDF5, but just attempts a 
> >>> parallel write to a file and parallel read.   That test failed when 
> >>> requesting more than 1 processor -- which they say indicates a failure of 
> >>> the openmpi installation.   The error message was:
> >>>
> >>> MPI_INIT: argc 1
> >>> MPI_INIT: argc 1
> >>> Testing simple C MPIO program with 2 processes accessing file 
> >>> ./mpitest.data
> >>>     (Filename can be specified via program argument)
> >>> Proc 0: hostname=Ron-Cohen-MBP.local
> >>> Proc 1: hostname=Ron-Cohen-MBP.local
> >>> MPI_BARRIER[0]: comm MPI_COMM_WORLD
> >>> MPI_BARRIER[1]: comm MPI_COMM_WORLD
> >>> Proc 0: MPI_File_open with MPI_MODE_EXCL failed (MPI_ERR_FILE: invalid 
> >>> file)
> >>> MPI_ABORT[0]: comm MPI_COMM_WORLD errorcode 1
> >>> MPI_BCAST[1]: buffer 7fff5a483048 count 1 datatype MPI_INT root 0 comm 
> >>> MPI_COMM_WORLD
> >>>
> >>> I then went back to my openmpi directories and tried running some of the 
> >>> individual tests in the test and examples directories.  In particular in 
> >>> test/class I found one test that seem to not be run as part of make check 
> >>> which failed, even with one processor; this is opal_bitmap.  Not sure if 
> >>> this is because 1.7.4rc1 is incomplete, or there is something wrong with 
> >>> the installation, or maybe a 32 vs 64 bit thing?   The error message is
> >>>
> >>> mpirun detected that one or more processes exited with non-zero status, 
> >>> thus causing the job to be terminated. The first process to do so was:
> >>>
> >>>   Process name: [[48805,1],0]
> >>>   Exit code:    255
> >>>
> >>> Any suggestions?
> >>>
> >>> More generally has anyone out there gotten an openmpi build on Mavericks 
> >>> to work with sufficient success that they can get the attached 
> >>> Sample_mpio.c (or better yet, parallel HDF5) to build?
> >>>
> >>> Details: Running Mac OS X 10.9.1 on a mid-2009 Macbook pro with 4 GB 
> >>> memory; tried openmpi 1.6.5 and 1.7.4rc1.  Built openmpi against the 
> >>> stock gcc that comes with XCode 5.0.2, and gfortran 4.9.0.
> >>>
> >>> Files attached: config.log.gz, openmpialllog.gz (output of running 
> >>> ompi_info --all), checklog2.gz (output of make.check in top openmpi  
> >>> directory).
> >>>
> >>> I am not attaching logs of make and install since those seem to have been 
> >>> successful, but can generate those if that would be helpful.
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to