On Mon, Nov 16, 2009 at 11:20:44AM +0100, Christoph Rackwitz wrote: > It's been ten days now. I'd like to resurrect this, in case someone > can help and just missed it.
Hi. I only check in on the OpenMPI list periodically. Sorry for the delay. The standard in no way requires any overlap for either the nonblocking communication or I/O routines. There are long and heated discussions about "strict" or "weak" interpretation of the progress rule and which one is "better". If you want asynchronous nonblocking I/O, you might have to roll all the way back to LAM or MPICH-1.2.7, when ROMIO used its own request objects and test/wait routines on top of the aio routines. In order to have standard request objects and use the standard test/wait routines, ROMIO switched to generalized requests. However, it's difficult to make progress on generalized requests without using threads, so we do all the work when the job is posted and as you observe, MPI_Wait() discovers immediately that the job is complete. I proposed an extension to MPI generalized requests a few years ago that would make them more amenable to libraries like ROMIO. Today systems have a ton of cores. Spawning an I/O thread is not such an onerous burden. But we don't spawn such a thread in ROMIO, and so nonblocking I/O is not asynchronous. What if you moved your MPI_File_write call into a thread? There are several ways to do this: you could use standard generalized reqeusts and make progress with a thread -- the application writer has a lot more knowledge about the systems and how best to allocate threads. If I may ask a slightly different question: you've got periods of I/O and periods of computation. Have you evaluated collective I/O? I know you are eager to hide I/O in the background -- to get it for free -- but there's no such thing as a free lunch. Background I/O might still perturb your computation phase, unless you make zero MPI calls in your computational phase. Collective I/O can bring some fairly powerful optimizations to the table and reduce your overall I/O costs, perhaps even reducing them enough that you no longer miss true asynchronous I/O ? ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA