On Oct 17, 2008, at 6:03 PM, Nick Collier wrote:
And under some conditions, I get the error:
[3] [belafonte.home:04938] *** An error occurred in MPI_Wait
[3] [belafonte.home:04938] *** on communicator MPI_COMM_WORLD
[3] [belafonte.home:04938] *** MPI_ERR_TRUNCATE: message truncated
[3] [belafonte.home:04938] *** MPI_ERRORS_ARE_FATAL (goodbye)
When I do get the error, tracking the send and receive counts shows
them as equal. And what I don't understand is that the
receive_complete function in the above executes and the recv Struct
actually contains the data that was sent. So, I'm confused about the
error and what its trying to tell me as it looks like everything
worked OK.
Perhaps it's a race condition? Remember that MPI_Wait triggers OMPI's
general progression engine -- so it may not be *this* receive that is
the problem; it could be some other pending receive that has a
mismatched send/receive length.
--
Jeff Squyres
Cisco Systems