Hello Ralph and others, I just got the following back from the HDF-5
support group, suggesting an ompi bug.   So I should either try 1.7.3 or a
recent nightly 1.7.4.    Will likely opt for 1.7.3, but hopefully someone
at openmpi can look at the problem for 1.7.4.   In short, the challenge is
to get a parallel hdf5 that passes make check-p with 1.7.4.





------------------
Hi Ron,

I had sent your message to the developer and he can reproduce the issue.
Here is what he says:

 ---
 I replicated this on Jam with ompi 1.7.4rc1. I saw the same error he is
seeing.
 Note that this is an un-stable release for ompi.
 I tried ompi 1.7.3 (feature - little more stable release). I didn't see the
 problems there.

 So this is an ompi bug. He can report it to the ompi list. He can just
point
 them to the t_mpi.c tests in our test suite in testpar/ and say it occurs
with
 their 1.7.4 rc1.
 ---

-Barbara



On Fri, Jan 17, 2014 at 9:39 AM, Ronald Cohen <rhco...@lbl.gov> wrote:

> Thanks, I've just gotten an email with some suggestions (and promise of
> more help) from the HDF5 support team.   I will report back here, as it may
> be of interest to others trying to build hdf5 on mavericks.
>
>
> On Fri, Jan 17, 2014 at 9:08 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Afraid I have no idea, but hopefully someone else here with experience
>> with HDF5 can chime in?
>>
>>
>> On Jan 17, 2014, at 9:03 AM, Ronald Cohen <rhco...@lbl.gov> wrote:
>>
>> Still a timely response, thank you.    The particular problem I noted
>> hasn't recurred; for reasons I will explain shortly I had to rebuild
>> openmpi again, and this time Sample_mpio.c compiled and ran successfully
>> from the start.
>>
>> But now my problem is trying to get parallel HDF5 to run.  In my first
>> attempt to build HDF5 it failed in the load stage because of unsatisifed
>> externals from openmpi, and I deduced the problem was having built openmpi
>> with --disable-static.   So I rebuilt with --enable-static and
>> --disable-dlopen (emulating a successful openmpi + hdf5 combination I had
>> built on Snow Leopard).   Once again openmpi passed its make check's, and
>> as noted above the Sample_mpio.c test compiled and ran fine.   And the
>> parallel hdf5 configure and make steps ran successfully.   But when I ran
>> make check for hdf5, the serial tests passed but none of the parallel tests
>> did.  Over a million test failures!  Error messages like:
>>
>> Proc 0: *** MPIO File size range test...
>> --------------------------------
>> MPI_Offset is signed 8 bytes integeral type
>> MPIO GB file write test MPItest.h5
>> MPIO GB file write test MPItest.h5
>> MPIO GB file write test MPItest.h5
>> MPIO GB file write test MPItest.h5
>> MPIO GB file write test MPItest.h5
>> MPIO GB file write test MPItest.h5
>> MPIO GB file read test MPItest.h5
>> MPIO GB file read test MPItest.h5
>> MPIO GB file read test MPItest.h5
>> MPIO GB file read test MPItest.h5
>> proc 3: found data error at [2141192192+0], expect -6, got 5
>> proc 3: found data error at [2141192192+1], expect -6, got 5
>>
>> And -- the specific errors reported, which processor, which location, and
>> the total number of errors changes if I rerun make check.
>>
>> I've sent configure, make and make check logs to the HDF5 help desk but
>> haven't gotten a response.
>>
>> I am now configuring openmpi (still 1.7.4rc1) with:
>>
>> ./configure --prefix=/usr/local/openmpi CC=gcc CXX=g++ FC=gfortran
>> F77=gfortran --enable-static --with-pic --disable-dlopen
>> --enable-mpirun-prefix-by-default
>>
>> and configuring HDF5 (version 1.8.12) with:
>>
>> configure --prefix=/usr/local/hdf5/par CC=mpicc CFLAGS=-fPIC FC=mpif90
>> FCFLAGS=-fPIC CXX=mpicxx CXXFLAGS=-fPIC --enable-parallel --enable-fortran
>>
>> This is the combination that worked for me with Snow Leopard (though it
>> was then earlier versions of both openmpi and hdf5.)
>>
>> If it matters, the gcc is the stock one with Mavericks' XCode, and
>> gfortran is 4.9.0.
>>
>> (I just noticed that the mpi fortran wrapper is now mpifort, but I also
>> see that mpif90 is still there and is a just link to mpifort.)
>>
>> Any suggestions?
>>
>>
>> On Fri, Jan 17, 2014 at 8:14 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> sorry for delayed response - just getting back from travel. I don't know
>>> why you would get that behavior other than a race condition. Afraid that
>>> code path is foreign to me, but perhaps one of the folks in the MPI-IO area
>>> can respond
>>>
>>>
>>> On Jan 15, 2014, at 4:26 PM, Ronald Cohen <rhco...@lbl.gov> wrote:
>>>
>>> Update: I reconfigured with enable_io_romio=yes, and this time -- mostly
>>> -- the test using Sample_mpio.c  passes.   Oddly the very first time I
>>> tried I got errors:
>>>
>>> % mpirun -np 2 sampleio
>>> Proc 1: hostname=Ron-Cohen-MBP.local
>>> Testing simple C MPIO program with 2 processes accessing file
>>> ./mpitest.data
>>>     (Filename can be specified via program argument)
>>> Proc 0: hostname=Ron-Cohen-MBP.local
>>> Proc 1: read data[0:1] got 0, expect 1
>>> Proc 1: read data[0:2] got 0, expect 2
>>> Proc 1: read data[0:3] got 0, expect 3
>>> Proc 1: read data[0:4] got 0, expect 4
>>> Proc 1: read data[0:5] got 0, expect 5
>>> Proc 1: read data[0:6] got 0, expect 6
>>> Proc 1: read data[0:7] got 0, expect 7
>>> Proc 1: read data[0:8] got 0, expect 8
>>> Proc 1: read data[0:9] got 0, expect 9
>>> Proc 1: read data[1:0] got 0, expect 10
>>> Proc 1: read data[1:1] got 0, expect 11
>>> Proc 1: read data[1:2] got 0, expect 12
>>> Proc 1: read data[1:3] got 0, expect 13
>>> Proc 1: read data[1:4] got 0, expect 14
>>> Proc 1: read data[1:5] got 0, expect 15
>>> Proc 1: read data[1:6] got 0, expect 16
>>> Proc 1: read data[1:7] got 0, expect 17
>>> Proc 1: read data[1:8] got 0, expect 18
>>> Proc 1: read data[1:9] got 0, expect 19
>>>
>>> --------------------------------------------------------------------------
>>> MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
>>> with errorcode 1.
>>>
>>> But when I reran the same mpirun command, the test was successful.   And
>>> deleting the executable and recompiling and then again running the same
>>> mpirun command, the test was successful.   Can someone explain that?
>>>
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 1:16 PM, Ronald Cohen <rhco...@lbl.gov> wrote:
>>>
>>>> Aha.   I guess I didn't know what the io-romio option does.   If you
>>>> look at my config.log you will see my configure line included
>>>> --disable-io-romio.    Guess I should change --disable to --enable.
>>>>
>>>> You seem to imply that the nightly build is stable enough that I should
>>>> probably switch to that rather than 1.7.4rc1.   Am I reading between the
>>>> lines correctly?
>>>>
>>>>
>>>>
>>>> On Wed, Jan 15, 2014 at 10:56 AM, Ralph Castain <r...@open-mpi.org>wrote:
>>>>
>>>>> Oh, a word of caution on those config params - you might need to check
>>>>> to ensure I don't disable romio in them. I don't normally build it as I
>>>>> don't use it. Since that is what you are trying to use, just change the
>>>>> "no" to "yes" (or delete that line altogether) and it will build.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 15, 2014 at 10:53 AM, Ralph Castain <r...@open-mpi.org>wrote:
>>>>>
>>>>>> You can find my configure options in the OMPI distribution at
>>>>>> contrib/platform/intel/bend/mac. You are welcome to use them - just
>>>>>> configure --with-platform=intel/bend/mac
>>>>>>
>>>>>> I work on the developer's trunk, of course, but also run the head of
>>>>>> the 1.7.4 branch (essentially the nightly tarball) on a fairly regular
>>>>>> basis.
>>>>>>
>>>>>> As for the opal_bitmap test: it wouldn't surprise me if that one was
>>>>>> stale. I can check on it later tonight, but I'd suspect that the test is
>>>>>> bad as we use that class in the code base and haven't seen an issue.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 15, 2014 at 10:49 AM, Ronald Cohen <rhco...@lbl.gov>wrote:
>>>>>>
>>>>>>> Ralph,
>>>>>>>
>>>>>>> I just sent out another post with the c file attached.
>>>>>>>
>>>>>>> If you can get that to work, and even if you can't can you tell me
>>>>>>> what configure options you use, and what version of open-mpi?   Thanks.
>>>>>>>
>>>>>>> Ron
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 15, 2014 at 10:36 AM, Ralph Castain 
>>>>>>> <r...@open-mpi.org>wrote:
>>>>>>>
>>>>>>>> BTW: could you send me your sample test code?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 15, 2014 at 10:34 AM, Ralph Castain 
>>>>>>>> <r...@open-mpi.org>wrote:
>>>>>>>>
>>>>>>>>> I regularly build on Mavericks and run without problem, though I
>>>>>>>>> haven't tried a parallel IO app. I'll give yours a try later, when I 
>>>>>>>>> get
>>>>>>>>> back to my Mac.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 15, 2014 at 10:04 AM, Ronald Cohen <rhco...@lbl.gov>wrote:
>>>>>>>>>
>>>>>>>>>> I have been struggling trying to get a usable build of openmpi on
>>>>>>>>>> Mac OSX Mavericks (10.9.1).  I can get openmpi to configure and build
>>>>>>>>>> without error, but have problems after that which depend on the 
>>>>>>>>>> openmpi
>>>>>>>>>> version.
>>>>>>>>>>
>>>>>>>>>> With 1.6.5, make check fails the opal_datatype_test, ddt_test,
>>>>>>>>>> and ddt_raw tests.  The various atomic_* tests pass.    See
>>>>>>>>>> checklogs_1.6.5, attached as a .gz file.
>>>>>>>>>>
>>>>>>>>>> Following suggestions from openmpi discussions I tried openmpi
>>>>>>>>>> version 1.7.4rc1.  In this case make check indicates all tests 
>>>>>>>>>> passed.  But
>>>>>>>>>> when I proceeded to try to build a parallel code (parallel HDF5) it
>>>>>>>>>> failed.  Following an email exchange with the HDF5 support people, 
>>>>>>>>>> they
>>>>>>>>>> suggested I try to compile and run the attached bit of simple code
>>>>>>>>>> Sample_mpio.c (which they supplied) which does not use any HDF5, but 
>>>>>>>>>> just
>>>>>>>>>> attempts a parallel write to a file and parallel read.   That test 
>>>>>>>>>> failed
>>>>>>>>>> when requesting more than 1 processor -- which they say indicates a 
>>>>>>>>>> failure
>>>>>>>>>> of the openmpi installation.   The error message was:
>>>>>>>>>>
>>>>>>>>>> MPI_INIT: argc 1
>>>>>>>>>> MPI_INIT: argc 1
>>>>>>>>>> Testing simple C MPIO program with 2 processes accessing file
>>>>>>>>>> ./mpitest.data
>>>>>>>>>>     (Filename can be specified via program argument)
>>>>>>>>>> Proc 0: hostname=Ron-Cohen-MBP.local
>>>>>>>>>> Proc 1: hostname=Ron-Cohen-MBP.local
>>>>>>>>>> MPI_BARRIER[0]: comm MPI_COMM_WORLD
>>>>>>>>>> MPI_BARRIER[1]: comm MPI_COMM_WORLD
>>>>>>>>>> Proc 0: MPI_File_open with MPI_MODE_EXCL failed (MPI_ERR_FILE:
>>>>>>>>>> invalid file)
>>>>>>>>>> MPI_ABORT[0]: comm MPI_COMM_WORLD errorcode 1
>>>>>>>>>> MPI_BCAST[1]: buffer 7fff5a483048 count 1 datatype MPI_INT root 0
>>>>>>>>>> comm MPI_COMM_WORLD
>>>>>>>>>>
>>>>>>>>>> I then went back to my openmpi directories and tried running some
>>>>>>>>>> of the individual tests in the test and examples directories.  In
>>>>>>>>>> particular in test/class I found one test that seem to not be run as 
>>>>>>>>>> part
>>>>>>>>>> of make check which failed, even with one processor; this is 
>>>>>>>>>> opal_bitmap.
>>>>>>>>>> Not sure if this is because 1.7.4rc1 is incomplete, or there is 
>>>>>>>>>> something
>>>>>>>>>> wrong with the installation, or maybe a 32 vs 64 bit thing?   The 
>>>>>>>>>> error
>>>>>>>>>> message is
>>>>>>>>>>
>>>>>>>>>> mpirun detected that one or more processes exited with non-zero
>>>>>>>>>> status, thus causing the job to be terminated. The first process to 
>>>>>>>>>> do so
>>>>>>>>>> was:
>>>>>>>>>>
>>>>>>>>>>   Process name: [[48805,1],0]
>>>>>>>>>>   Exit code:    255
>>>>>>>>>>
>>>>>>>>>> Any suggestions?
>>>>>>>>>>
>>>>>>>>>> More generally has anyone out there gotten an openmpi build on
>>>>>>>>>> Mavericks to work with sufficient success that they can get the 
>>>>>>>>>> attached
>>>>>>>>>> Sample_mpio.c (or better yet, parallel HDF5) to build?
>>>>>>>>>>
>>>>>>>>>> Details: Running Mac OS X 10.9.1 on a mid-2009 Macbook pro with 4
>>>>>>>>>> GB memory; tried openmpi 1.6.5 and 1.7.4rc1.  Built openmpi against 
>>>>>>>>>> the
>>>>>>>>>> stock gcc that comes with XCode 5.0.2, and gfortran 4.9.0.
>>>>>>>>>>
>>>>>>>>>> Files attached: config.log.gz, openmpialllog.gz (output of
>>>>>>>>>> running ompi_info --all), checklog2.gz (output of make.check in top 
>>>>>>>>>> openmpi
>>>>>>>>>> directory).
>>>>>>>>>>
>>>>>>>>>> I am not attaching logs of make and install since those seem to
>>>>>>>>>> have been successful, but can generate those if that would be 
>>>>>>>>>> helpful.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>

Reply via email to