I said I'd report back about trying ompio on lustre mounted without flock. I couldn't immediately figure out how to run MTT. I tried the parallel hdf5 tests from the hdf5 1.10.3, but I got errors with that even with the relevant environment variable to put the files on (local) /tmp. Then it occurred to me rather late that romio would have tests. Using the "runtests" script modified to use "--mca io ompio" in the romio/test directory from ompi 3.1.2 on no-flock-mounted Lustre, after building the tests with an installed ompi-3.1.2, it did this and apparently hung at the end:
**** Testing simple.c **** No Errors **** Testing async.c **** No Errors **** Testing async-multiple.c **** No Errors **** Testing atomicity.c **** Process 3: readbuf[118] is 0, should be 10 Process 2: readbuf[65] is 0, should be 10 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- Process 1: readbuf[145] is 0, should be 10 **** Testing coll_test.c **** No Errors **** Testing excl.c **** error opening file test error opening file test error opening file test Then I ran on local /tmp as a sanity check and still got errors: **** Testing I/O functions **** **** Testing simple.c **** No Errors **** Testing async.c **** No Errors **** Testing async-multiple.c **** No Errors **** Testing atomicity.c **** Process 2: readbuf[155] is 0, should be 10 Process 1: readbuf[128] is 0, should be 10 Process 3: readbuf[128] is 0, should be 10 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- **** Testing coll_test.c **** No Errors **** Testing excl.c **** No Errors **** Testing file_info.c **** No Errors **** Testing i_noncontig.c **** No Errors **** Testing noncontig.c **** No Errors **** Testing noncontig_coll.c **** No Errors **** Testing noncontig_coll2.c **** No Errors **** Testing aggregation1 **** No Errors **** Testing aggregation2 **** No Errors **** Testing hindexed **** No Errors **** Testing misc.c **** file pointer posn = 265, should be 10 byte offset = 3020, should be 1080 file pointer posn = 265, should be 10 byte offset = 3020, should be 1080 file pointer posn = 265, should be 10 byte offset = 3020, should be 1080 file pointer posn in bytes = 3280, should be 1000 file pointer posn = 265, should be 10 byte offset = 3020, should be 1080 file pointer posn in bytes = 3280, should be 1000 file pointer posn in bytes = 3280, should be 1000 file pointer posn in bytes = 3280, should be 1000 Found 12 errors **** Testing shared_fp.c **** No Errors **** Testing ordered_fp.c **** No Errors **** Testing split_coll.c **** No Errors **** Testing psimple.c **** No Errors **** Testing error.c **** File set view did not return an error Found 1 errors **** Testing status.c **** No Errors **** Testing types_with_zeros **** No Errors **** Testing darray_read **** No Errors I even got an error with romio on /tmp (modifying the script to use mpirun --mca io romi314): **** Testing error.c **** Unexpected error message MPI_ERR_ARG: invalid argument of some other kind Found 1 errors _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users