Gilles,

I think the semantics of MPI_File_close does not necessarily mandate that there has to be an MPI_Barrier based on that text snippet. However, I think what the Barrier does in this scenario is 'hide' a consequence of an implementation aspect. So the MPI standard might not mandate a Barrier synchronization, but the actual implementation does.

( I have btw. not have had yet the time to think through the different mechanisms that ompio offers for shared file pointers, and whether all of them truly require a Barrier for the same reason). Hope to get to that soon.

THanks

EDga


On 5/31/2016 9:33 PM, Gilles Gouaillardet wrote:

Edgar,


this is the bug reported at http://www.open-mpi.org/community/lists/users/2016/05/29333.php


now i am having some second thoughts about it ...

per the MPI_File_close man page :

"MPI_File_close first synchronizes file state, then closes the file associated with fh.

MPI_File_close is a collective routine. The user is responsible for ensuring that all outstanding requests associated with fh have completed before calling MPI_File_close."


does this implies MPI_File_close() internally performs a MPI_Barrier() ?

or am i over-interpreting the man page ?


My point is if all tasks but one call MPI_File_close() *before* the other one calls MPI_File_write_at(), there is really nothing to flush, and though MPI_File_close() is a collective routine (just like MPI_Bcast() ) that does not necessarily means it has a MPI_Barrier() semantic.


Cheers,


Gilles


On 5/31/2016 11:18 PM, Edgar Gabriel wrote:

just for my understanding, which bug in ompio are you referring? I am only aware of a single (pretty minor) pending issue in the 2.x series

Thanks

Edgar


On 5/31/2016 1:28 AM, Gilles Gouaillardet wrote:

Thanks for the report.

the romio included in the v1.10 series is a bit old and did not include the fix,

i made PR #1206 for that http://www.open-mpi.org/community/lists/users/2016/05/29333.php

feel free to manually apply the patch available at https://github.com/open-mpi/ompi-release/commit/a0ea9fb6cbe4cf71567c9fc7fd8f4be384617ad4.diff


note that the issue is already fixed in romio of the v2.x series and master.

that being said, the default io module here is ompio, and it is currently buggy, so if you are using these series, you need to

mpirun --mca io romio314 ...

for the time being


Cheers,


Gilles


On 5/31/2016 2:27 PM, Cihan Altinay wrote:
Hello list,

I recently upgraded my distribution-supplied OpenMPI packages (debian) from 1.6.5 to 1.10.2 and the attached test is no longer guaranteed to produce the expected output.
In plain English what the test is doing is:
1) open a file in parallel (all on the same local ext3/4 filesystem),
2) use MPI_File_write_at() or MPI_File_write_shared() to write to it,
3) close the file using MPI_File_close(),
4) then check the file size (either by stat(), or by fseek+ftell)

My reading of the standard is that MPI_File_close() is a collective operation so I should reliably get the correct file size in step 4. However, while this worked with version 1.6.5 and with Intel MPI this is no longer the case with the current OpenMPI version. I was able to confirm the same behaviour on a fresh Ubuntu 16.0.4 install in a VM.
The more ranks I use the more likely I get a wrong file size.

Is there anything I'm missing or is this a regression?

Thanks,
Cihan



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/05/29333.php


--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Labhttp://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
--


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/05/29335.php


--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
--

Reply via email to