Hello Ralph and others, I just got the following back from the HDF-5 support group, suggesting an ompi bug. So I should either try 1.7.3 or a recent nightly 1.7.4. Will likely opt for 1.7.3, but hopefully someone at openmpi can look at the problem for 1.7.4. In short, the challenge is to get a parallel hdf5 that passes make check-p with 1.7.4.
------------------ Hi Ron, I had sent your message to the developer and he can reproduce the issue. Here is what he says: --- I replicated this on Jam with ompi 1.7.4rc1. I saw the same error he is seeing. Note that this is an un-stable release for ompi. I tried ompi 1.7.3 (feature - little more stable release). I didn't see the problems there. So this is an ompi bug. He can report it to the ompi list. He can just point them to the t_mpi.c tests in our test suite in testpar/ and say it occurs with their 1.7.4 rc1. --- -Barbara On Fri, Jan 17, 2014 at 9:39 AM, Ronald Cohen <rhco...@lbl.gov> wrote: > Thanks, I've just gotten an email with some suggestions (and promise of > more help) from the HDF5 support team. I will report back here, as it may > be of interest to others trying to build hdf5 on mavericks. > > > On Fri, Jan 17, 2014 at 9:08 AM, Ralph Castain <r...@open-mpi.org> wrote: > >> Afraid I have no idea, but hopefully someone else here with experience >> with HDF5 can chime in? >> >> >> On Jan 17, 2014, at 9:03 AM, Ronald Cohen <rhco...@lbl.gov> wrote: >> >> Still a timely response, thank you. The particular problem I noted >> hasn't recurred; for reasons I will explain shortly I had to rebuild >> openmpi again, and this time Sample_mpio.c compiled and ran successfully >> from the start. >> >> But now my problem is trying to get parallel HDF5 to run. In my first >> attempt to build HDF5 it failed in the load stage because of unsatisifed >> externals from openmpi, and I deduced the problem was having built openmpi >> with --disable-static. So I rebuilt with --enable-static and >> --disable-dlopen (emulating a successful openmpi + hdf5 combination I had >> built on Snow Leopard). Once again openmpi passed its make check's, and >> as noted above the Sample_mpio.c test compiled and ran fine. And the >> parallel hdf5 configure and make steps ran successfully. But when I ran >> make check for hdf5, the serial tests passed but none of the parallel tests >> did. Over a million test failures! Error messages like: >> >> Proc 0: *** MPIO File size range test... >> -------------------------------- >> MPI_Offset is signed 8 bytes integeral type >> MPIO GB file write test MPItest.h5 >> MPIO GB file write test MPItest.h5 >> MPIO GB file write test MPItest.h5 >> MPIO GB file write test MPItest.h5 >> MPIO GB file write test MPItest.h5 >> MPIO GB file write test MPItest.h5 >> MPIO GB file read test MPItest.h5 >> MPIO GB file read test MPItest.h5 >> MPIO GB file read test MPItest.h5 >> MPIO GB file read test MPItest.h5 >> proc 3: found data error at [2141192192+0], expect -6, got 5 >> proc 3: found data error at [2141192192+1], expect -6, got 5 >> >> And -- the specific errors reported, which processor, which location, and >> the total number of errors changes if I rerun make check. >> >> I've sent configure, make and make check logs to the HDF5 help desk but >> haven't gotten a response. >> >> I am now configuring openmpi (still 1.7.4rc1) with: >> >> ./configure --prefix=/usr/local/openmpi CC=gcc CXX=g++ FC=gfortran >> F77=gfortran --enable-static --with-pic --disable-dlopen >> --enable-mpirun-prefix-by-default >> >> and configuring HDF5 (version 1.8.12) with: >> >> configure --prefix=/usr/local/hdf5/par CC=mpicc CFLAGS=-fPIC FC=mpif90 >> FCFLAGS=-fPIC CXX=mpicxx CXXFLAGS=-fPIC --enable-parallel --enable-fortran >> >> This is the combination that worked for me with Snow Leopard (though it >> was then earlier versions of both openmpi and hdf5.) >> >> If it matters, the gcc is the stock one with Mavericks' XCode, and >> gfortran is 4.9.0. >> >> (I just noticed that the mpi fortran wrapper is now mpifort, but I also >> see that mpif90 is still there and is a just link to mpifort.) >> >> Any suggestions? >> >> >> On Fri, Jan 17, 2014 at 8:14 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> sorry for delayed response - just getting back from travel. I don't know >>> why you would get that behavior other than a race condition. Afraid that >>> code path is foreign to me, but perhaps one of the folks in the MPI-IO area >>> can respond >>> >>> >>> On Jan 15, 2014, at 4:26 PM, Ronald Cohen <rhco...@lbl.gov> wrote: >>> >>> Update: I reconfigured with enable_io_romio=yes, and this time -- mostly >>> -- the test using Sample_mpio.c passes. Oddly the very first time I >>> tried I got errors: >>> >>> % mpirun -np 2 sampleio >>> Proc 1: hostname=Ron-Cohen-MBP.local >>> Testing simple C MPIO program with 2 processes accessing file >>> ./mpitest.data >>> (Filename can be specified via program argument) >>> Proc 0: hostname=Ron-Cohen-MBP.local >>> Proc 1: read data[0:1] got 0, expect 1 >>> Proc 1: read data[0:2] got 0, expect 2 >>> Proc 1: read data[0:3] got 0, expect 3 >>> Proc 1: read data[0:4] got 0, expect 4 >>> Proc 1: read data[0:5] got 0, expect 5 >>> Proc 1: read data[0:6] got 0, expect 6 >>> Proc 1: read data[0:7] got 0, expect 7 >>> Proc 1: read data[0:8] got 0, expect 8 >>> Proc 1: read data[0:9] got 0, expect 9 >>> Proc 1: read data[1:0] got 0, expect 10 >>> Proc 1: read data[1:1] got 0, expect 11 >>> Proc 1: read data[1:2] got 0, expect 12 >>> Proc 1: read data[1:3] got 0, expect 13 >>> Proc 1: read data[1:4] got 0, expect 14 >>> Proc 1: read data[1:5] got 0, expect 15 >>> Proc 1: read data[1:6] got 0, expect 16 >>> Proc 1: read data[1:7] got 0, expect 17 >>> Proc 1: read data[1:8] got 0, expect 18 >>> Proc 1: read data[1:9] got 0, expect 19 >>> >>> -------------------------------------------------------------------------- >>> MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD >>> with errorcode 1. >>> >>> But when I reran the same mpirun command, the test was successful. And >>> deleting the executable and recompiling and then again running the same >>> mpirun command, the test was successful. Can someone explain that? >>> >>> >>> >>> >>> On Wed, Jan 15, 2014 at 1:16 PM, Ronald Cohen <rhco...@lbl.gov> wrote: >>> >>>> Aha. I guess I didn't know what the io-romio option does. If you >>>> look at my config.log you will see my configure line included >>>> --disable-io-romio. Guess I should change --disable to --enable. >>>> >>>> You seem to imply that the nightly build is stable enough that I should >>>> probably switch to that rather than 1.7.4rc1. Am I reading between the >>>> lines correctly? >>>> >>>> >>>> >>>> On Wed, Jan 15, 2014 at 10:56 AM, Ralph Castain <r...@open-mpi.org>wrote: >>>> >>>>> Oh, a word of caution on those config params - you might need to check >>>>> to ensure I don't disable romio in them. I don't normally build it as I >>>>> don't use it. Since that is what you are trying to use, just change the >>>>> "no" to "yes" (or delete that line altogether) and it will build. >>>>> >>>>> >>>>> >>>>> On Wed, Jan 15, 2014 at 10:53 AM, Ralph Castain <r...@open-mpi.org>wrote: >>>>> >>>>>> You can find my configure options in the OMPI distribution at >>>>>> contrib/platform/intel/bend/mac. You are welcome to use them - just >>>>>> configure --with-platform=intel/bend/mac >>>>>> >>>>>> I work on the developer's trunk, of course, but also run the head of >>>>>> the 1.7.4 branch (essentially the nightly tarball) on a fairly regular >>>>>> basis. >>>>>> >>>>>> As for the opal_bitmap test: it wouldn't surprise me if that one was >>>>>> stale. I can check on it later tonight, but I'd suspect that the test is >>>>>> bad as we use that class in the code base and haven't seen an issue. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 15, 2014 at 10:49 AM, Ronald Cohen <rhco...@lbl.gov>wrote: >>>>>> >>>>>>> Ralph, >>>>>>> >>>>>>> I just sent out another post with the c file attached. >>>>>>> >>>>>>> If you can get that to work, and even if you can't can you tell me >>>>>>> what configure options you use, and what version of open-mpi? Thanks. >>>>>>> >>>>>>> Ron >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 15, 2014 at 10:36 AM, Ralph Castain >>>>>>> <r...@open-mpi.org>wrote: >>>>>>> >>>>>>>> BTW: could you send me your sample test code? >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 15, 2014 at 10:34 AM, Ralph Castain >>>>>>>> <r...@open-mpi.org>wrote: >>>>>>>> >>>>>>>>> I regularly build on Mavericks and run without problem, though I >>>>>>>>> haven't tried a parallel IO app. I'll give yours a try later, when I >>>>>>>>> get >>>>>>>>> back to my Mac. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 15, 2014 at 10:04 AM, Ronald Cohen <rhco...@lbl.gov>wrote: >>>>>>>>> >>>>>>>>>> I have been struggling trying to get a usable build of openmpi on >>>>>>>>>> Mac OSX Mavericks (10.9.1). I can get openmpi to configure and build >>>>>>>>>> without error, but have problems after that which depend on the >>>>>>>>>> openmpi >>>>>>>>>> version. >>>>>>>>>> >>>>>>>>>> With 1.6.5, make check fails the opal_datatype_test, ddt_test, >>>>>>>>>> and ddt_raw tests. The various atomic_* tests pass. See >>>>>>>>>> checklogs_1.6.5, attached as a .gz file. >>>>>>>>>> >>>>>>>>>> Following suggestions from openmpi discussions I tried openmpi >>>>>>>>>> version 1.7.4rc1. In this case make check indicates all tests >>>>>>>>>> passed. But >>>>>>>>>> when I proceeded to try to build a parallel code (parallel HDF5) it >>>>>>>>>> failed. Following an email exchange with the HDF5 support people, >>>>>>>>>> they >>>>>>>>>> suggested I try to compile and run the attached bit of simple code >>>>>>>>>> Sample_mpio.c (which they supplied) which does not use any HDF5, but >>>>>>>>>> just >>>>>>>>>> attempts a parallel write to a file and parallel read. That test >>>>>>>>>> failed >>>>>>>>>> when requesting more than 1 processor -- which they say indicates a >>>>>>>>>> failure >>>>>>>>>> of the openmpi installation. The error message was: >>>>>>>>>> >>>>>>>>>> MPI_INIT: argc 1 >>>>>>>>>> MPI_INIT: argc 1 >>>>>>>>>> Testing simple C MPIO program with 2 processes accessing file >>>>>>>>>> ./mpitest.data >>>>>>>>>> (Filename can be specified via program argument) >>>>>>>>>> Proc 0: hostname=Ron-Cohen-MBP.local >>>>>>>>>> Proc 1: hostname=Ron-Cohen-MBP.local >>>>>>>>>> MPI_BARRIER[0]: comm MPI_COMM_WORLD >>>>>>>>>> MPI_BARRIER[1]: comm MPI_COMM_WORLD >>>>>>>>>> Proc 0: MPI_File_open with MPI_MODE_EXCL failed (MPI_ERR_FILE: >>>>>>>>>> invalid file) >>>>>>>>>> MPI_ABORT[0]: comm MPI_COMM_WORLD errorcode 1 >>>>>>>>>> MPI_BCAST[1]: buffer 7fff5a483048 count 1 datatype MPI_INT root 0 >>>>>>>>>> comm MPI_COMM_WORLD >>>>>>>>>> >>>>>>>>>> I then went back to my openmpi directories and tried running some >>>>>>>>>> of the individual tests in the test and examples directories. In >>>>>>>>>> particular in test/class I found one test that seem to not be run as >>>>>>>>>> part >>>>>>>>>> of make check which failed, even with one processor; this is >>>>>>>>>> opal_bitmap. >>>>>>>>>> Not sure if this is because 1.7.4rc1 is incomplete, or there is >>>>>>>>>> something >>>>>>>>>> wrong with the installation, or maybe a 32 vs 64 bit thing? The >>>>>>>>>> error >>>>>>>>>> message is >>>>>>>>>> >>>>>>>>>> mpirun detected that one or more processes exited with non-zero >>>>>>>>>> status, thus causing the job to be terminated. The first process to >>>>>>>>>> do so >>>>>>>>>> was: >>>>>>>>>> >>>>>>>>>> Process name: [[48805,1],0] >>>>>>>>>> Exit code: 255 >>>>>>>>>> >>>>>>>>>> Any suggestions? >>>>>>>>>> >>>>>>>>>> More generally has anyone out there gotten an openmpi build on >>>>>>>>>> Mavericks to work with sufficient success that they can get the >>>>>>>>>> attached >>>>>>>>>> Sample_mpio.c (or better yet, parallel HDF5) to build? >>>>>>>>>> >>>>>>>>>> Details: Running Mac OS X 10.9.1 on a mid-2009 Macbook pro with 4 >>>>>>>>>> GB memory; tried openmpi 1.6.5 and 1.7.4rc1. Built openmpi against >>>>>>>>>> the >>>>>>>>>> stock gcc that comes with XCode 5.0.2, and gfortran 4.9.0. >>>>>>>>>> >>>>>>>>>> Files attached: config.log.gz, openmpialllog.gz (output of >>>>>>>>>> running ompi_info --all), checklog2.gz (output of make.check in top >>>>>>>>>> openmpi >>>>>>>>>> directory). >>>>>>>>>> >>>>>>>>>> I am not attaching logs of make and install since those seem to >>>>>>>>>> have been successful, but can generate those if that would be >>>>>>>>>> helpful. >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> >>>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >