sorry for delayed response - just getting back from travel. I don't know why 
you would get that behavior other than a race condition. Afraid that code path 
is foreign to me, but perhaps one of the folks in the MPI-IO area can respond


On Jan 15, 2014, at 4:26 PM, Ronald Cohen <rhco...@lbl.gov> wrote:

> Update: I reconfigured with enable_io_romio=yes, and this time -- mostly -- 
> the test using Sample_mpio.c  passes.   Oddly the very first time I tried I 
> got errors:  
> 
> % mpirun -np 2 sampleio 
> Proc 1: hostname=Ron-Cohen-MBP.local
> Testing simple C MPIO program with 2 processes accessing file ./mpitest.data
>     (Filename can be specified via program argument)
> Proc 0: hostname=Ron-Cohen-MBP.local
> Proc 1: read data[0:1] got 0, expect 1
> Proc 1: read data[0:2] got 0, expect 2
> Proc 1: read data[0:3] got 0, expect 3
> Proc 1: read data[0:4] got 0, expect 4
> Proc 1: read data[0:5] got 0, expect 5
> Proc 1: read data[0:6] got 0, expect 6
> Proc 1: read data[0:7] got 0, expect 7
> Proc 1: read data[0:8] got 0, expect 8
> Proc 1: read data[0:9] got 0, expect 9
> Proc 1: read data[1:0] got 0, expect 10
> Proc 1: read data[1:1] got 0, expect 11
> Proc 1: read data[1:2] got 0, expect 12
> Proc 1: read data[1:3] got 0, expect 13
> Proc 1: read data[1:4] got 0, expect 14
> Proc 1: read data[1:5] got 0, expect 15
> Proc 1: read data[1:6] got 0, expect 16
> Proc 1: read data[1:7] got 0, expect 17
> Proc 1: read data[1:8] got 0, expect 18
> Proc 1: read data[1:9] got 0, expect 19
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD 
> with errorcode 1.
> 
> But when I reran the same mpirun command, the test was successful.   And 
> deleting the executable and recompiling and then again running the same 
> mpirun command, the test was successful.   Can someone explain that?
> 
> 
> 
> 
> On Wed, Jan 15, 2014 at 1:16 PM, Ronald Cohen <rhco...@lbl.gov> wrote:
> Aha.   I guess I didn't know what the io-romio option does.   If you look at 
> my config.log you will see my configure line included --disable-io-romio.    
> Guess I should change --disable to --enable.
> 
> You seem to imply that the nightly build is stable enough that I should 
> probably switch to that rather than 1.7.4rc1.   Am I reading between the 
> lines correctly?
> 
> 
> 
> On Wed, Jan 15, 2014 at 10:56 AM, Ralph Castain <r...@open-mpi.org> wrote:
> Oh, a word of caution on those config params - you might need to check to 
> ensure I don't disable romio in them. I don't normally build it as I don't 
> use it. Since that is what you are trying to use, just change the "no" to 
> "yes" (or delete that line altogether) and it will build.
> 
> 
> 
> On Wed, Jan 15, 2014 at 10:53 AM, Ralph Castain <r...@open-mpi.org> wrote:
> You can find my configure options in the OMPI distribution at 
> contrib/platform/intel/bend/mac. You are welcome to use them - just configure 
> --with-platform=intel/bend/mac
> 
> I work on the developer's trunk, of course, but also run the head of the 
> 1.7.4 branch (essentially the nightly tarball) on a fairly regular basis.
> 
> As for the opal_bitmap test: it wouldn't surprise me if that one was stale. I 
> can check on it later tonight, but I'd suspect that the test is bad as we use 
> that class in the code base and haven't seen an issue.
> 
> 
> 
> On Wed, Jan 15, 2014 at 10:49 AM, Ronald Cohen <rhco...@lbl.gov> wrote:
> Ralph,
> 
> I just sent out another post with the c file attached.
> 
> If you can get that to work, and even if you can't can you tell me what 
> configure options you use, and what version of open-mpi?   Thanks.
> 
> Ron
> 
> 
> On Wed, Jan 15, 2014 at 10:36 AM, Ralph Castain <r...@open-mpi.org> wrote:
> BTW: could you send me your sample test code?
> 
> 
> On Wed, Jan 15, 2014 at 10:34 AM, Ralph Castain <r...@open-mpi.org> wrote:
> I regularly build on Mavericks and run without problem, though I haven't 
> tried a parallel IO app. I'll give yours a try later, when I get back to my 
> Mac.
> 
> 
> 
> On Wed, Jan 15, 2014 at 10:04 AM, Ronald Cohen <rhco...@lbl.gov> wrote:
> I have been struggling trying to get a usable build of openmpi on Mac OSX 
> Mavericks (10.9.1).  I can get openmpi to configure and build without error, 
> but have problems after that which depend on the openmpi version.  
> 
> With 1.6.5, make check fails the opal_datatype_test, ddt_test, and ddt_raw 
> tests.  The various atomic_* tests pass.    See checklogs_1.6.5, attached as 
> a .gz file.
> 
> Following suggestions from openmpi discussions I tried openmpi version 
> 1.7.4rc1.  In this case make check indicates all tests passed.  But when I 
> proceeded to try to build a parallel code (parallel HDF5) it failed.  
> Following an email exchange with the HDF5 support people, they suggested I 
> try to compile and run the attached bit of simple code Sample_mpio.c (which 
> they supplied) which does not use any HDF5, but just attempts a parallel 
> write to a file and parallel read.   That test failed when requesting more 
> than 1 processor -- which they say indicates a failure of the openmpi 
> installation.   The error message was:
> 
> MPI_INIT: argc 1
> MPI_INIT: argc 1
> Testing simple C MPIO program with 2 processes accessing file ./mpitest.data
>     (Filename can be specified via program argument)
> Proc 0: hostname=Ron-Cohen-MBP.local
> Proc 1: hostname=Ron-Cohen-MBP.local
> MPI_BARRIER[0]: comm MPI_COMM_WORLD
> MPI_BARRIER[1]: comm MPI_COMM_WORLD
> Proc 0: MPI_File_open with MPI_MODE_EXCL failed (MPI_ERR_FILE: invalid file)
> MPI_ABORT[0]: comm MPI_COMM_WORLD errorcode 1
> MPI_BCAST[1]: buffer 7fff5a483048 count 1 datatype MPI_INT root 0 comm 
> MPI_COMM_WORLD
> 
> I then went back to my openmpi directories and tried running some of the 
> individual tests in the test and examples directories.  In particular in 
> test/class I found one test that seem to not be run as part of make check 
> which failed, even with one processor; this is opal_bitmap.  Not sure if this 
> is because 1.7.4rc1 is incomplete, or there is something wrong with the 
> installation, or maybe a 32 vs 64 bit thing?   The error message is 
> 
> mpirun detected that one or more processes exited with non-zero status, thus 
> causing the job to be terminated. The first process to do so was:
> 
>   Process name: [[48805,1],0]
>   Exit code:    255
> 
> Any suggestions?
> 
> More generally has anyone out there gotten an openmpi build on Mavericks to 
> work with sufficient success that they can get the attached Sample_mpio.c (or 
> better yet, parallel HDF5) to build?  
> 
> Details: Running Mac OS X 10.9.1 on a mid-2009 Macbook pro with 4 GB memory; 
> tried openmpi 1.6.5 and 1.7.4rc1.  Built openmpi against the stock gcc that 
> comes with XCode 5.0.2, and gfortran 4.9.0.  
> 
> Files attached: config.log.gz, openmpialllog.gz (output of running ompi_info 
> --all), checklog2.gz (output of make.check in top openmpi directory).  
> 
> I am not attaching logs of make and install since those seem to have been 
> successful, but can generate those if that would be helpful.
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to