Good suggestions, and thanks! But since I haven't been able to get the problem to recur and I'm stuck now on other issues related to getting parallel hdf5 to pass its make check, I will likely not follow up on this particular (non-recurring) issue (except maybe I should forward your comments to the HDF5 support team, since this is their test.)
On Fri, Jan 17, 2014 at 10:12 AM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > I'm looking at your code, and I'm not actually an expert in the MPI IO > sutff... but do you have a race condition in the file close+delete and the > open with EXCL? > > I'm asking because I don't know offhand if the the file close+delete is > supposed to be collective and not return until the file is guaranteed to be > deleted, as visible from all MPI processes, or not. > > If this guarantee is not provided, then perhaps a barrier between the > close+delete and the next file_open should be sufficient to avoid the > race...? > > > On Jan 15, 2014, at 7:26 PM, Ronald Cohen <rhco...@lbl.gov> wrote: > > > Update: I reconfigured with enable_io_romio=yes, and this time -- mostly > -- the test using Sample_mpio.c passes. Oddly the very first time I > tried I got errors: > > > > % mpirun -np 2 sampleio > > Proc 1: hostname=Ron-Cohen-MBP.local > > Testing simple C MPIO program with 2 processes accessing file > ./mpitest.data > > (Filename can be specified via program argument) > > Proc 0: hostname=Ron-Cohen-MBP.local > > Proc 1: read data[0:1] got 0, expect 1 > > Proc 1: read data[0:2] got 0, expect 2 > > Proc 1: read data[0:3] got 0, expect 3 > > Proc 1: read data[0:4] got 0, expect 4 > > Proc 1: read data[0:5] got 0, expect 5 > > Proc 1: read data[0:6] got 0, expect 6 > > Proc 1: read data[0:7] got 0, expect 7 > > Proc 1: read data[0:8] got 0, expect 8 > > Proc 1: read data[0:9] got 0, expect 9 > > Proc 1: read data[1:0] got 0, expect 10 > > Proc 1: read data[1:1] got 0, expect 11 > > Proc 1: read data[1:2] got 0, expect 12 > > Proc 1: read data[1:3] got 0, expect 13 > > Proc 1: read data[1:4] got 0, expect 14 > > Proc 1: read data[1:5] got 0, expect 15 > > Proc 1: read data[1:6] got 0, expect 16 > > Proc 1: read data[1:7] got 0, expect 17 > > Proc 1: read data[1:8] got 0, expect 18 > > Proc 1: read data[1:9] got 0, expect 19 > > > -------------------------------------------------------------------------- > > MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD > > with errorcode 1. > > > > But when I reran the same mpirun command, the test was successful. And > deleting the executable and recompiling and then again running the same > mpirun command, the test was successful. Can someone explain that? > > > > > > > > > > On Wed, Jan 15, 2014 at 1:16 PM, Ronald Cohen <rhco...@lbl.gov> wrote: > > Aha. I guess I didn't know what the io-romio option does. If you > look at my config.log you will see my configure line included > --disable-io-romio. Guess I should change --disable to --enable. > > > > You seem to imply that the nightly build is stable enough that I should > probably switch to that rather than 1.7.4rc1. Am I reading between the > lines correctly? > > > > > > > > On Wed, Jan 15, 2014 at 10:56 AM, Ralph Castain <r...@open-mpi.org> > wrote: > > Oh, a word of caution on those config params - you might need to check > to ensure I don't disable romio in them. I don't normally build it as I > don't use it. Since that is what you are trying to use, just change the > "no" to "yes" (or delete that line altogether) and it will build. > > > > > > > > On Wed, Jan 15, 2014 at 10:53 AM, Ralph Castain <r...@open-mpi.org> > wrote: > > You can find my configure options in the OMPI distribution at > contrib/platform/intel/bend/mac. You are welcome to use them - just > configure --with-platform=intel/bend/mac > > > > I work on the developer's trunk, of course, but also run the head of the > 1.7.4 branch (essentially the nightly tarball) on a fairly regular basis. > > > > As for the opal_bitmap test: it wouldn't surprise me if that one was > stale. I can check on it later tonight, but I'd suspect that the test is > bad as we use that class in the code base and haven't seen an issue. > > > > > > > > On Wed, Jan 15, 2014 at 10:49 AM, Ronald Cohen <rhco...@lbl.gov> wrote: > > Ralph, > > > > I just sent out another post with the c file attached. > > > > If you can get that to work, and even if you can't can you tell me what > configure options you use, and what version of open-mpi? Thanks. > > > > Ron > > > > > > On Wed, Jan 15, 2014 at 10:36 AM, Ralph Castain <r...@open-mpi.org> > wrote: > > BTW: could you send me your sample test code? > > > > > > On Wed, Jan 15, 2014 at 10:34 AM, Ralph Castain <r...@open-mpi.org> > wrote: > > I regularly build on Mavericks and run without problem, though I haven't > tried a parallel IO app. I'll give yours a try later, when I get back to my > Mac. > > > > > > > > On Wed, Jan 15, 2014 at 10:04 AM, Ronald Cohen <rhco...@lbl.gov> wrote: > > I have been struggling trying to get a usable build of openmpi on Mac > OSX Mavericks (10.9.1). I can get openmpi to configure and build without > error, but have problems after that which depend on the openmpi version. > > > > With 1.6.5, make check fails the opal_datatype_test, ddt_test, and > ddt_raw tests. The various atomic_* tests pass. See checklogs_1.6.5, > attached as a .gz file. > > > > Following suggestions from openmpi discussions I tried openmpi version > 1.7.4rc1. In this case make check indicates all tests passed. But when I > proceeded to try to build a parallel code (parallel HDF5) it failed. > Following an email exchange with the HDF5 support people, they suggested I > try to compile and run the attached bit of simple code Sample_mpio.c (which > they supplied) which does not use any HDF5, but just attempts a parallel > write to a file and parallel read. That test failed when requesting more > than 1 processor -- which they say indicates a failure of the openmpi > installation. The error message was: > > > > MPI_INIT: argc 1 > > MPI_INIT: argc 1 > > Testing simple C MPIO program with 2 processes accessing file > ./mpitest.data > > (Filename can be specified via program argument) > > Proc 0: hostname=Ron-Cohen-MBP.local > > Proc 1: hostname=Ron-Cohen-MBP.local > > MPI_BARRIER[0]: comm MPI_COMM_WORLD > > MPI_BARRIER[1]: comm MPI_COMM_WORLD > > Proc 0: MPI_File_open with MPI_MODE_EXCL failed (MPI_ERR_FILE: invalid > file) > > MPI_ABORT[0]: comm MPI_COMM_WORLD errorcode 1 > > MPI_BCAST[1]: buffer 7fff5a483048 count 1 datatype MPI_INT root 0 comm > MPI_COMM_WORLD > > > > I then went back to my openmpi directories and tried running some of the > individual tests in the test and examples directories. In particular in > test/class I found one test that seem to not be run as part of make check > which failed, even with one processor; this is opal_bitmap. Not sure if > this is because 1.7.4rc1 is incomplete, or there is something wrong with > the installation, or maybe a 32 vs 64 bit thing? The error message is > > > > mpirun detected that one or more processes exited with non-zero status, > thus causing the job to be terminated. The first process to do so was: > > > > Process name: [[48805,1],0] > > Exit code: 255 > > > > Any suggestions? > > > > More generally has anyone out there gotten an openmpi build on Mavericks > to work with sufficient success that they can get the attached > Sample_mpio.c (or better yet, parallel HDF5) to build? > > > > Details: Running Mac OS X 10.9.1 on a mid-2009 Macbook pro with 4 GB > memory; tried openmpi 1.6.5 and 1.7.4rc1. Built openmpi against the stock > gcc that comes with XCode 5.0.2, and gfortran 4.9.0. > > > > Files attached: config.log.gz, openmpialllog.gz (output of running > ompi_info --all), checklog2.gz (output of make.check in top openmpi > directory). > > > > I am not attaching logs of make and install since those seem to have > been successful, but can generate those if that would be helpful. > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >