Can you specify exactly which issue you're referring to? - test failing when you had ROMIO disabled - test (sometimes) failing when you had ROMIO disabled - compiling / linking issues
? On Jan 17, 2014, at 1:50 PM, Ronald Cohen <rhco...@lbl.gov> wrote: > Hello Ralph and others, I just got the following back from the HDF-5 support > group, suggesting an ompi bug. So I should either try 1.7.3 or a recent > nightly 1.7.4. Will likely opt for 1.7.3, but hopefully someone at openmpi > can look at the problem for 1.7.4. In short, the challenge is to get a > parallel hdf5 that passes make check-p with 1.7.4. > > > > > > ------------------ > Hi Ron, > > I had sent your message to the developer and he can reproduce the issue. > Here is what he says: > > --- > I replicated this on Jam with ompi 1.7.4rc1. I saw the same error he is > seeing. > Note that this is an un-stable release for ompi. > I tried ompi 1.7.3 (feature - little more stable release). I didn't see the > problems there. > > So this is an ompi bug. He can report it to the ompi list. He can just point > them to the t_mpi.c tests in our test suite in testpar/ and say it occurs > with > their 1.7.4 rc1. > --- > > -Barbara > > > > On Fri, Jan 17, 2014 at 9:39 AM, Ronald Cohen <rhco...@lbl.gov> wrote: > Thanks, I've just gotten an email with some suggestions (and promise of more > help) from the HDF5 support team. I will report back here, as it may be of > interest to others trying to build hdf5 on mavericks. > > > On Fri, Jan 17, 2014 at 9:08 AM, Ralph Castain <r...@open-mpi.org> wrote: > Afraid I have no idea, but hopefully someone else here with experience with > HDF5 can chime in? > > > On Jan 17, 2014, at 9:03 AM, Ronald Cohen <rhco...@lbl.gov> wrote: > >> Still a timely response, thank you. The particular problem I noted hasn't >> recurred; for reasons I will explain shortly I had to rebuild openmpi again, >> and this time Sample_mpio.c compiled and ran successfully from the start. >> >> But now my problem is trying to get parallel HDF5 to run. In my first >> attempt to build HDF5 it failed in the load stage because of unsatisifed >> externals from openmpi, and I deduced the problem was having built openmpi >> with --disable-static. So I rebuilt with --enable-static and >> --disable-dlopen (emulating a successful openmpi + hdf5 combination I had >> built on Snow Leopard). Once again openmpi passed its make check's, and as >> noted above the Sample_mpio.c test compiled and ran fine. And the parallel >> hdf5 configure and make steps ran successfully. But when I ran make check >> for hdf5, the serial tests passed but none of the parallel tests did. Over >> a million test failures! Error messages like: >> >> Proc 0: *** MPIO File size range test... >> -------------------------------- >> MPI_Offset is signed 8 bytes integeral type >> MPIO GB file write test MPItest.h5 >> MPIO GB file write test MPItest.h5 >> MPIO GB file write test MPItest.h5 >> MPIO GB file write test MPItest.h5 >> MPIO GB file write test MPItest.h5 >> MPIO GB file write test MPItest.h5 >> MPIO GB file read test MPItest.h5 >> MPIO GB file read test MPItest.h5 >> MPIO GB file read test MPItest.h5 >> MPIO GB file read test MPItest.h5 >> proc 3: found data error at [2141192192+0], expect -6, got 5 >> proc 3: found data error at [2141192192+1], expect -6, got 5 >> >> And -- the specific errors reported, which processor, which location, and >> the total number of errors changes if I rerun make check. >> >> I've sent configure, make and make check logs to the HDF5 help desk but >> haven't gotten a response. >> >> I am now configuring openmpi (still 1.7.4rc1) with: >> >> ./configure --prefix=/usr/local/openmpi CC=gcc CXX=g++ FC=gfortran >> F77=gfortran --enable-static --with-pic --disable-dlopen >> --enable-mpirun-prefix-by-default >> >> and configuring HDF5 (version 1.8.12) with: >> >> configure --prefix=/usr/local/hdf5/par CC=mpicc CFLAGS=-fPIC FC=mpif90 >> FCFLAGS=-fPIC CXX=mpicxx CXXFLAGS=-fPIC --enable-parallel --enable-fortran >> >> This is the combination that worked for me with Snow Leopard (though it was >> then earlier versions of both openmpi and hdf5.) >> >> If it matters, the gcc is the stock one with Mavericks' XCode, and gfortran >> is 4.9.0. >> >> (I just noticed that the mpi fortran wrapper is now mpifort, but I also see >> that mpif90 is still there and is a just link to mpifort.) >> >> Any suggestions? >> >> >> On Fri, Jan 17, 2014 at 8:14 AM, Ralph Castain <r...@open-mpi.org> wrote: >> sorry for delayed response - just getting back from travel. I don't know why >> you would get that behavior other than a race condition. Afraid that code >> path is foreign to me, but perhaps one of the folks in the MPI-IO area can >> respond >> >> >> On Jan 15, 2014, at 4:26 PM, Ronald Cohen <rhco...@lbl.gov> wrote: >> >>> Update: I reconfigured with enable_io_romio=yes, and this time -- mostly -- >>> the test using Sample_mpio.c passes. Oddly the very first time I tried I >>> got errors: >>> >>> % mpirun -np 2 sampleio >>> Proc 1: hostname=Ron-Cohen-MBP.local >>> Testing simple C MPIO program with 2 processes accessing file ./mpitest.data >>> (Filename can be specified via program argument) >>> Proc 0: hostname=Ron-Cohen-MBP.local >>> Proc 1: read data[0:1] got 0, expect 1 >>> Proc 1: read data[0:2] got 0, expect 2 >>> Proc 1: read data[0:3] got 0, expect 3 >>> Proc 1: read data[0:4] got 0, expect 4 >>> Proc 1: read data[0:5] got 0, expect 5 >>> Proc 1: read data[0:6] got 0, expect 6 >>> Proc 1: read data[0:7] got 0, expect 7 >>> Proc 1: read data[0:8] got 0, expect 8 >>> Proc 1: read data[0:9] got 0, expect 9 >>> Proc 1: read data[1:0] got 0, expect 10 >>> Proc 1: read data[1:1] got 0, expect 11 >>> Proc 1: read data[1:2] got 0, expect 12 >>> Proc 1: read data[1:3] got 0, expect 13 >>> Proc 1: read data[1:4] got 0, expect 14 >>> Proc 1: read data[1:5] got 0, expect 15 >>> Proc 1: read data[1:6] got 0, expect 16 >>> Proc 1: read data[1:7] got 0, expect 17 >>> Proc 1: read data[1:8] got 0, expect 18 >>> Proc 1: read data[1:9] got 0, expect 19 >>> -------------------------------------------------------------------------- >>> MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD >>> with errorcode 1. >>> >>> But when I reran the same mpirun command, the test was successful. And >>> deleting the executable and recompiling and then again running the same >>> mpirun command, the test was successful. Can someone explain that? >>> >>> >>> >>> >>> On Wed, Jan 15, 2014 at 1:16 PM, Ronald Cohen <rhco...@lbl.gov> wrote: >>> Aha. I guess I didn't know what the io-romio option does. If you look >>> at my config.log you will see my configure line included >>> --disable-io-romio. Guess I should change --disable to --enable. >>> >>> You seem to imply that the nightly build is stable enough that I should >>> probably switch to that rather than 1.7.4rc1. Am I reading between the >>> lines correctly? >>> >>> >>> >>> On Wed, Jan 15, 2014 at 10:56 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> Oh, a word of caution on those config params - you might need to check to >>> ensure I don't disable romio in them. I don't normally build it as I don't >>> use it. Since that is what you are trying to use, just change the "no" to >>> "yes" (or delete that line altogether) and it will build. >>> >>> >>> >>> On Wed, Jan 15, 2014 at 10:53 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> You can find my configure options in the OMPI distribution at >>> contrib/platform/intel/bend/mac. You are welcome to use them - just >>> configure --with-platform=intel/bend/mac >>> >>> I work on the developer's trunk, of course, but also run the head of the >>> 1.7.4 branch (essentially the nightly tarball) on a fairly regular basis. >>> >>> As for the opal_bitmap test: it wouldn't surprise me if that one was stale. >>> I can check on it later tonight, but I'd suspect that the test is bad as we >>> use that class in the code base and haven't seen an issue. >>> >>> >>> >>> On Wed, Jan 15, 2014 at 10:49 AM, Ronald Cohen <rhco...@lbl.gov> wrote: >>> Ralph, >>> >>> I just sent out another post with the c file attached. >>> >>> If you can get that to work, and even if you can't can you tell me what >>> configure options you use, and what version of open-mpi? Thanks. >>> >>> Ron >>> >>> >>> On Wed, Jan 15, 2014 at 10:36 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> BTW: could you send me your sample test code? >>> >>> >>> On Wed, Jan 15, 2014 at 10:34 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> I regularly build on Mavericks and run without problem, though I haven't >>> tried a parallel IO app. I'll give yours a try later, when I get back to my >>> Mac. >>> >>> >>> >>> On Wed, Jan 15, 2014 at 10:04 AM, Ronald Cohen <rhco...@lbl.gov> wrote: >>> I have been struggling trying to get a usable build of openmpi on Mac OSX >>> Mavericks (10.9.1). I can get openmpi to configure and build without >>> error, but have problems after that which depend on the openmpi version. >>> >>> With 1.6.5, make check fails the opal_datatype_test, ddt_test, and ddt_raw >>> tests. The various atomic_* tests pass. See checklogs_1.6.5, attached >>> as a .gz file. >>> >>> Following suggestions from openmpi discussions I tried openmpi version >>> 1.7.4rc1. In this case make check indicates all tests passed. But when I >>> proceeded to try to build a parallel code (parallel HDF5) it failed. >>> Following an email exchange with the HDF5 support people, they suggested I >>> try to compile and run the attached bit of simple code Sample_mpio.c (which >>> they supplied) which does not use any HDF5, but just attempts a parallel >>> write to a file and parallel read. That test failed when requesting more >>> than 1 processor -- which they say indicates a failure of the openmpi >>> installation. The error message was: >>> >>> MPI_INIT: argc 1 >>> MPI_INIT: argc 1 >>> Testing simple C MPIO program with 2 processes accessing file ./mpitest.data >>> (Filename can be specified via program argument) >>> Proc 0: hostname=Ron-Cohen-MBP.local >>> Proc 1: hostname=Ron-Cohen-MBP.local >>> MPI_BARRIER[0]: comm MPI_COMM_WORLD >>> MPI_BARRIER[1]: comm MPI_COMM_WORLD >>> Proc 0: MPI_File_open with MPI_MODE_EXCL failed (MPI_ERR_FILE: invalid file) >>> MPI_ABORT[0]: comm MPI_COMM_WORLD errorcode 1 >>> MPI_BCAST[1]: buffer 7fff5a483048 count 1 datatype MPI_INT root 0 comm >>> MPI_COMM_WORLD >>> >>> I then went back to my openmpi directories and tried running some of the >>> individual tests in the test and examples directories. In particular in >>> test/class I found one test that seem to not be run as part of make check >>> which failed, even with one processor; this is opal_bitmap. Not sure if >>> this is because 1.7.4rc1 is incomplete, or there is something wrong with >>> the installation, or maybe a 32 vs 64 bit thing? The error message is >>> >>> mpirun detected that one or more processes exited with non-zero status, >>> thus causing the job to be terminated. The first process to do so was: >>> >>> Process name: [[48805,1],0] >>> Exit code: 255 >>> >>> Any suggestions? >>> >>> More generally has anyone out there gotten an openmpi build on Mavericks to >>> work with sufficient success that they can get the attached Sample_mpio.c >>> (or better yet, parallel HDF5) to build? >>> >>> Details: Running Mac OS X 10.9.1 on a mid-2009 Macbook pro with 4 GB >>> memory; tried openmpi 1.6.5 and 1.7.4rc1. Built openmpi against the stock >>> gcc that comes with XCode 5.0.2, and gfortran 4.9.0. >>> >>> Files attached: config.log.gz, openmpialllog.gz (output of running >>> ompi_info --all), checklog2.gz (output of make.check in top openmpi >>> directory). >>> >>> I am not attaching logs of make and install since those seem to have been >>> successful, but can generate those if that would be helpful. >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/