Thanks very much for your reply! To Sir Jeff Squyres:
I think it fails due to truncation errors. I am now logging information of each send and receive to find out the reason. To Sir Nick Papior Andersen: Oh, wait (mpi_wait) is never called in my codes. What I do is to call MPI_Irecv once. Then MPI_Test is called several times to check whether new messages are available. If new messages are available, some functions to process these messages are called. I will add the wait function and check the running results. On Thu, Sep 18, 2014 at 8:47 PM, Nick Papior Andersen <nickpap...@gmail.com> wrote: > In complement to Jeff, I would add that using asynchronous messages > REQUIRES that you wait (mpi_wait) for all messages at some point. Even > though this might not seem obvious it is due to memory allocation "behind > the scenes" which are only de-allocated upon completion through a wait > statement. > > > 2014-09-18 12:36 GMT+02:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com>: > > On Sep 18, 2014, at 2:43 AM, XingFENG <xingf...@cse.unsw.edu.au> wrote: >> >> > a. How to get more information about errors? I got errors like below. >> This says that program exited abnormally in function MPI_Test(). But is >> there a way to know more about the error? >> > >> > *** An error occurred in MPI_Test >> > *** on communicator MPI_COMM_WORLD >> > *** MPI_ERR_TRUNCATE: message truncated >> > *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >> >> For the purpose of this discussion, let's take a simplification that you >> are sending and receiving the same datatypes (e.g., you're sending MPI_INT >> and you're receiving MPI_INT). >> >> This error means that you tried to receive message with too small a >> buffer. >> >> Specifically, MPI says that if you send a message that is X element long >> (e.g., 20 MPI_INTs), then the matching receive must be Y elements, where >> Y>=X (e.g., *at least* 20 MPI_INTs). If the receiver provides a Y where >> Y<X, this is a truncation error. >> >> Unfortunately, Open MPI doesn't report a whole lot more information about >> these kinds of errors than what you're seeing, sorry. >> >> > b. Are there anything to note about asynchronous communication? I use >> MPI_Isend, MPI_Irecv, MPI_Test to implement asynchronous communication. My >> program works well on small data sets(10K nodes graphs), but it exits >> abnormally on large data set (1M nodes graphs). >> >> Is it failing due to truncation errors, or something else? >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/09/25344.php >> > > > > -- > Kind regards Nick > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25345.php > -- Best Regards.