Thanks very much for your reply!

To Sir Jeff Squyres:

I think it fails due to truncation errors. I am now logging information of
each send and receive to find out the reason.




To Sir Nick Papior Andersen:

Oh, wait (mpi_wait) is never called in my codes.

What I do is to call MPI_Irecv once. Then MPI_Test is called several times
to check whether new messages are available. If new messages are available,
some functions to process these messages are called.

I will add the wait function and check the running results.

On Thu, Sep 18, 2014 at 8:47 PM, Nick Papior Andersen <nickpap...@gmail.com>
wrote:

> In complement to Jeff, I would add that using asynchronous messages
> REQUIRES that you wait (mpi_wait) for all messages at some point. Even
> though this might not seem obvious it is due to memory allocation "behind
> the scenes" which are only de-allocated upon completion through a wait
> statement.
>
>
> 2014-09-18 12:36 GMT+02:00 Jeff Squyres (jsquyres) <jsquy...@cisco.com>:
>
> On Sep 18, 2014, at 2:43 AM, XingFENG <xingf...@cse.unsw.edu.au> wrote:
>>
>> > a. How to get more information about errors? I got errors like below.
>> This says that program exited abnormally in function MPI_Test(). But is
>> there a way to know more about the error?
>> >
>> > *** An error occurred in MPI_Test
>> > *** on communicator MPI_COMM_WORLD
>> > *** MPI_ERR_TRUNCATE: message truncated
>> > *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>>
>> For the purpose of this discussion, let's take a simplification that you
>> are sending and receiving the same datatypes (e.g., you're sending MPI_INT
>> and you're receiving MPI_INT).
>>
>> This error means that you tried to receive message with too small a
>> buffer.
>>
>> Specifically, MPI says that if you send a message that is X element long
>> (e.g., 20 MPI_INTs), then the matching receive must be Y elements, where
>> Y>=X (e.g., *at least* 20 MPI_INTs).  If the receiver provides a Y where
>> Y<X, this is a truncation error.
>>
>> Unfortunately, Open MPI doesn't report a whole lot more information about
>> these kinds of errors than what you're seeing, sorry.
>>
>> > b. Are there anything to note about asynchronous communication? I use
>> MPI_Isend, MPI_Irecv, MPI_Test to implement asynchronous communication. My
>> program works well on small data sets(10K nodes graphs), but it exits
>> abnormally on large data set (1M nodes graphs).
>>
>> Is it failing due to truncation errors, or something else?
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/09/25344.php
>>
>
>
>
> --
> Kind regards Nick
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/09/25345.php
>



-- 
Best Regards.

Reply via email to