Thanks for the advice Edgar. This appears to help but does not
eliminate the problem. This is what I observe (out of maybe 10 trials)
when using '-mca io romio314':
- no failures using 40 processes across 2 nodes (each node has 20 cores)
- no failures if using 'MPI_File_write_at'
- same type of
I opened an issue on this, hope to have the fix available next week.
https://github.com/open-mpi/ompi/issues/4334
Thanks
Edgar
On 10/12/2017 8:36 PM, Edgar Gabriel wrote:
try for now to switch to the romio314 component with OpenMPI. There is
an issue with NFS and OMPIO that I am aware of and w
try for now to switch to the romio314 component with OpenMPI. There is
an issue with NFS and OMPIO that I am aware of and working on, that
might trigger this behavior (although it should actually work for
collective I/O even in that case).
try to set something like
mpirun --mca io romio314 ..