Good suggestions, and thanks!   But since I haven't been able to get the
problem to recur and I'm stuck now on other issues related to getting
parallel hdf5 to pass its make check, I will likely not follow up on this
particular (non-recurring) issue (except maybe I should forward your
comments to the HDF5 support team, since this is their test.)


On Fri, Jan 17, 2014 at 10:12 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> I'm looking at your code, and I'm not actually an expert in the MPI IO
> sutff... but do you have a race condition in the file close+delete and the
> open with EXCL?
>
> I'm asking because I don't know offhand if the the file close+delete is
> supposed to be collective and not return until the file is guaranteed to be
> deleted, as visible from all MPI processes, or not.
>
> If this guarantee is not provided, then perhaps a barrier between the
> close+delete and the next file_open should be sufficient to avoid the
> race...?
>
>
> On Jan 15, 2014, at 7:26 PM, Ronald Cohen <rhco...@lbl.gov> wrote:
>
> > Update: I reconfigured with enable_io_romio=yes, and this time -- mostly
> -- the test using Sample_mpio.c  passes.   Oddly the very first time I
> tried I got errors:
> >
> > % mpirun -np 2 sampleio
> > Proc 1: hostname=Ron-Cohen-MBP.local
> > Testing simple C MPIO program with 2 processes accessing file
> ./mpitest.data
> >     (Filename can be specified via program argument)
> > Proc 0: hostname=Ron-Cohen-MBP.local
> > Proc 1: read data[0:1] got 0, expect 1
> > Proc 1: read data[0:2] got 0, expect 2
> > Proc 1: read data[0:3] got 0, expect 3
> > Proc 1: read data[0:4] got 0, expect 4
> > Proc 1: read data[0:5] got 0, expect 5
> > Proc 1: read data[0:6] got 0, expect 6
> > Proc 1: read data[0:7] got 0, expect 7
> > Proc 1: read data[0:8] got 0, expect 8
> > Proc 1: read data[0:9] got 0, expect 9
> > Proc 1: read data[1:0] got 0, expect 10
> > Proc 1: read data[1:1] got 0, expect 11
> > Proc 1: read data[1:2] got 0, expect 12
> > Proc 1: read data[1:3] got 0, expect 13
> > Proc 1: read data[1:4] got 0, expect 14
> > Proc 1: read data[1:5] got 0, expect 15
> > Proc 1: read data[1:6] got 0, expect 16
> > Proc 1: read data[1:7] got 0, expect 17
> > Proc 1: read data[1:8] got 0, expect 18
> > Proc 1: read data[1:9] got 0, expect 19
> >
> --------------------------------------------------------------------------
> > MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
> > with errorcode 1.
> >
> > But when I reran the same mpirun command, the test was successful.   And
> deleting the executable and recompiling and then again running the same
> mpirun command, the test was successful.   Can someone explain that?
> >
> >
> >
> >
> > On Wed, Jan 15, 2014 at 1:16 PM, Ronald Cohen <rhco...@lbl.gov> wrote:
> > Aha.   I guess I didn't know what the io-romio option does.   If you
> look at my config.log you will see my configure line included
> --disable-io-romio.    Guess I should change --disable to --enable.
> >
> > You seem to imply that the nightly build is stable enough that I should
> probably switch to that rather than 1.7.4rc1.   Am I reading between the
> lines correctly?
> >
> >
> >
> > On Wed, Jan 15, 2014 at 10:56 AM, Ralph Castain <r...@open-mpi.org>
> wrote:
> > Oh, a word of caution on those config params - you might need to check
> to ensure I don't disable romio in them. I don't normally build it as I
> don't use it. Since that is what you are trying to use, just change the
> "no" to "yes" (or delete that line altogether) and it will build.
> >
> >
> >
> > On Wed, Jan 15, 2014 at 10:53 AM, Ralph Castain <r...@open-mpi.org>
> wrote:
> > You can find my configure options in the OMPI distribution at
> contrib/platform/intel/bend/mac. You are welcome to use them - just
> configure --with-platform=intel/bend/mac
> >
> > I work on the developer's trunk, of course, but also run the head of the
> 1.7.4 branch (essentially the nightly tarball) on a fairly regular basis.
> >
> > As for the opal_bitmap test: it wouldn't surprise me if that one was
> stale. I can check on it later tonight, but I'd suspect that the test is
> bad as we use that class in the code base and haven't seen an issue.
> >
> >
> >
> > On Wed, Jan 15, 2014 at 10:49 AM, Ronald Cohen <rhco...@lbl.gov> wrote:
> > Ralph,
> >
> > I just sent out another post with the c file attached.
> >
> > If you can get that to work, and even if you can't can you tell me what
> configure options you use, and what version of open-mpi?   Thanks.
> >
> > Ron
> >
> >
> > On Wed, Jan 15, 2014 at 10:36 AM, Ralph Castain <r...@open-mpi.org>
> wrote:
> > BTW: could you send me your sample test code?
> >
> >
> > On Wed, Jan 15, 2014 at 10:34 AM, Ralph Castain <r...@open-mpi.org>
> wrote:
> > I regularly build on Mavericks and run without problem, though I haven't
> tried a parallel IO app. I'll give yours a try later, when I get back to my
> Mac.
> >
> >
> >
> > On Wed, Jan 15, 2014 at 10:04 AM, Ronald Cohen <rhco...@lbl.gov> wrote:
> > I have been struggling trying to get a usable build of openmpi on Mac
> OSX Mavericks (10.9.1).  I can get openmpi to configure and build without
> error, but have problems after that which depend on the openmpi version.
> >
> > With 1.6.5, make check fails the opal_datatype_test, ddt_test, and
> ddt_raw tests.  The various atomic_* tests pass.    See checklogs_1.6.5,
> attached as a .gz file.
> >
> > Following suggestions from openmpi discussions I tried openmpi version
> 1.7.4rc1.  In this case make check indicates all tests passed.  But when I
> proceeded to try to build a parallel code (parallel HDF5) it failed.
>  Following an email exchange with the HDF5 support people, they suggested I
> try to compile and run the attached bit of simple code Sample_mpio.c (which
> they supplied) which does not use any HDF5, but just attempts a parallel
> write to a file and parallel read.   That test failed when requesting more
> than 1 processor -- which they say indicates a failure of the openmpi
> installation.   The error message was:
> >
> > MPI_INIT: argc 1
> > MPI_INIT: argc 1
> > Testing simple C MPIO program with 2 processes accessing file
> ./mpitest.data
> >     (Filename can be specified via program argument)
> > Proc 0: hostname=Ron-Cohen-MBP.local
> > Proc 1: hostname=Ron-Cohen-MBP.local
> > MPI_BARRIER[0]: comm MPI_COMM_WORLD
> > MPI_BARRIER[1]: comm MPI_COMM_WORLD
> > Proc 0: MPI_File_open with MPI_MODE_EXCL failed (MPI_ERR_FILE: invalid
> file)
> > MPI_ABORT[0]: comm MPI_COMM_WORLD errorcode 1
> > MPI_BCAST[1]: buffer 7fff5a483048 count 1 datatype MPI_INT root 0 comm
> MPI_COMM_WORLD
> >
> > I then went back to my openmpi directories and tried running some of the
> individual tests in the test and examples directories.  In particular in
> test/class I found one test that seem to not be run as part of make check
> which failed, even with one processor; this is opal_bitmap.  Not sure if
> this is because 1.7.4rc1 is incomplete, or there is something wrong with
> the installation, or maybe a 32 vs 64 bit thing?   The error message is
> >
> > mpirun detected that one or more processes exited with non-zero status,
> thus causing the job to be terminated. The first process to do so was:
> >
> >   Process name: [[48805,1],0]
> >   Exit code:    255
> >
> > Any suggestions?
> >
> > More generally has anyone out there gotten an openmpi build on Mavericks
> to work with sufficient success that they can get the attached
> Sample_mpio.c (or better yet, parallel HDF5) to build?
> >
> > Details: Running Mac OS X 10.9.1 on a mid-2009 Macbook pro with 4 GB
> memory; tried openmpi 1.6.5 and 1.7.4rc1.  Built openmpi against the stock
> gcc that comes with XCode 5.0.2, and gfortran 4.9.0.
> >
> > Files attached: config.log.gz, openmpialllog.gz (output of running
> ompi_info --all), checklog2.gz (output of make.check in top openmpi
> directory).
> >
> > I am not attaching logs of make and install since those seem to have
> been successful, but can generate those if that would be helpful.
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to