Rob, Thanks for sharing your "jumpshot" experiment for demonstrating my point of view I really appreciate the result !
Pascal De : Rob Latham <r...@mcs.anl.gov> A : Open MPI Users <us...@open-mpi.org> Date : 05/07/2011 17:34 Objet : Re: [OMPI users] File seeking with shared filepointer issues Envoyé par : users-boun...@open-mpi.org On Mon, Jun 27, 2011 at 03:20:36PM +0200, pascal.dev...@bull.net wrote: > > Christian, > > Suppose you have N processes calling the first MPI_File_get_position_shared > (). > > Some of them are running faster and could execute the call to > MPI_File_seek_shared() before all the other have got their file position. > (Note that the "collective" primitive is not a synchronization. In that > case, all parameters are broadcast to the process 0 and checked by process > 0. All > the other processes are not blocked). > > So the slow processes can get the file position that has just been > modified by the faster. > > That is the reason why, in your program, It is necessary to synchronize all > processes just before the call to MPI_File_seek_shared(). There's this tool "Jumpshot" that's fun to use but does have a bit of a learning curve: it just presents so much data it can be hard to make sense of it. Still, I like use jumpshot and this seemed like a good chance to demonstrate Pascal's point about timings: I've attached a jumpshot trace of an 8 processor run of Christian's test case. - I've built ROMIO to record not only the MPI-IO calls but the underlying posix i/o calls as well. - Then, I enabled display of just the shared file pointer operations and the posix calls. Sorry if anyone is color blind. color / call purple / MPI_File_get_position_shared pink / MPI_File_seek_shared orange / posix open green / posix close blue / posix write The attached trace shows - rank 0 exiting MPI_File_get_position_shared relatively quickly, - rank 0 enters MPI_File_seek_shared before anyone else. - The blue bar is where rank 0 writes the new value of the shared file pointer, - Rank 0 did so before any other process read the value of the shared file pointer (the green bar) Anyway, this is all known behavior. collecting the traces seemed like a fun way to spend the last hour on friday before the long (USA) weekend :> ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA [pièce jointe "shared_file_ptr_jumpshot.png" supprimée par Pascal Deveze/FR/BULL] _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users