Hm, thanks for the report, I will look into this. I did not run the romio 
tests, but the hdf5 tests are run regularly and with 3.1.2 you should not have 
any problems on a regular unix fs. How many processes did you use, and which 
tests did you run specifically? The main tests that I execute from their 
parallel testsuite are testphdf5 and t_shapesame.

I will also look into the testmpio that you mentioned in the next couple of 
days.
Thanks
Edgar


> -----Original Message-----
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Dave
> Love
> Sent: Monday, October 8, 2018 10:20 AM
> To: Open MPI Users <users@lists.open-mpi.org>
> Subject: Re: [OMPI users] ompio on Lustre
> 
> I said I'd report back about trying ompio on lustre mounted without flock.
> 
> I couldn't immediately figure out how to run MTT.  I tried the parallel
> hdf5 tests from the hdf5 1.10.3, but I got errors with that even with the
> relevant environment variable to put the files on (local) /tmp.
> Then it occurred to me rather late that romio would have tests.  Using the
> "runtests" script modified to use "--mca io ompio" in the romio/test directory
> from ompi 3.1.2 on no-flock-mounted Lustre, after building the tests with an
> installed ompi-3.1.2, it did this and apparently hung at the end:
> 
>   **** Testing simple.c ****
>    No Errors
>   **** Testing async.c ****
>    No Errors
>   **** Testing async-multiple.c ****
>    No Errors
>   **** Testing atomicity.c ****
>   Process 3: readbuf[118] is 0, should be 10
>   Process 2: readbuf[65] is 0, should be 10
>   --------------------------------------------------------------------------
>   MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
>   with errorcode 1.
> 
>   NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>   You may or may not see output from other processes, depending on
>   exactly when Open MPI kills them.
>   --------------------------------------------------------------------------
>   Process 1: readbuf[145] is 0, should be 10
>   **** Testing coll_test.c ****
>    No Errors
>   **** Testing excl.c ****
>   error opening file test
>   error opening file test
>   error opening file test
> 
> Then I ran on local /tmp as a sanity check and still got errors:
> 
>   **** Testing I/O functions ****
>   **** Testing simple.c ****
>    No Errors
>   **** Testing async.c ****
>    No Errors
>   **** Testing async-multiple.c ****
>    No Errors
>   **** Testing atomicity.c ****
>   Process 2: readbuf[155] is 0, should be 10
>   Process 1: readbuf[128] is 0, should be 10
>   Process 3: readbuf[128] is 0, should be 10
>   --------------------------------------------------------------------------
>   MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
>   with errorcode 1.
> 
>   NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>   You may or may not see output from other processes, depending on
>   exactly when Open MPI kills them.
>   --------------------------------------------------------------------------
>   **** Testing coll_test.c ****
>    No Errors
>   **** Testing excl.c ****
>    No Errors
>   **** Testing file_info.c ****
>    No Errors
>   **** Testing i_noncontig.c ****
>    No Errors
>   **** Testing noncontig.c ****
>    No Errors
>   **** Testing noncontig_coll.c ****
>    No Errors
>   **** Testing noncontig_coll2.c ****
>    No Errors
>   **** Testing aggregation1 ****
>    No Errors
>   **** Testing aggregation2 ****
>    No Errors
>   **** Testing hindexed ****
>    No Errors
>   **** Testing misc.c ****
>   file pointer posn = 265, should be 10
> 
>   byte offset = 3020, should be 1080
> 
>   file pointer posn = 265, should be 10
> 
>   byte offset = 3020, should be 1080
> 
>   file pointer posn = 265, should be 10
> 
>   byte offset = 3020, should be 1080
> 
>   file pointer posn in bytes = 3280, should be 1000
> 
>   file pointer posn = 265, should be 10
> 
>   byte offset = 3020, should be 1080
> 
>   file pointer posn in bytes = 3280, should be 1000
> 
>   file pointer posn in bytes = 3280, should be 1000
> 
>   file pointer posn in bytes = 3280, should be 1000
> 
>   Found 12 errors
>   **** Testing shared_fp.c ****
>    No Errors
>   **** Testing ordered_fp.c ****
>    No Errors
>   **** Testing split_coll.c ****
>    No Errors
>   **** Testing psimple.c ****
>    No Errors
>   **** Testing error.c ****
>   File set view did not return an error
>    Found 1 errors
>   **** Testing status.c ****
>    No Errors
>   **** Testing types_with_zeros ****
>    No Errors
>   **** Testing darray_read ****
>    No Errors
> 
> I even got an error with romio on /tmp (modifying the script to use mpirun --
> mca io romi314):
> 
>   **** Testing error.c ****
>   Unexpected error message MPI_ERR_ARG: invalid argument of some other
> kind
>    Found 1 errors
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to