Thanks for your advice. I added tags for messages in ascending order. But
it didn't work, either.

For example, after 103043 times of communication, in the sender side, it
sends an int 78 with tag 206086, followed by 78 bytes data with tag 206087.
In the receiver side, it receives an int 41 with tag 206086. ( actually, 41
is the length of the next message to be sent by sender )
Hence, it allocates a buffer with length 41. However, there are 78 bytes
data. Hence, it exits with error MPI_ERR_TRUNCATE: message truncated.



On Fri, Sep 19, 2014 at 1:55 AM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> There is no guarantee that the messages will be received in the same
> order that they were sent.
> Use tags or another mechanism to match the messages on send and recv ends.
>
> On 09/18/2014 10:42 AM, XingFENG wrote:
>
>> I have found some thing strange.
>>
>> Basically, in my codes, processes send and receive messages to/from
>> others with variable lengths asynchronously. When sending messages, a
>> process would first send the length of message and then the content of
>> the message. When receiving, a process would first receive the length.
>> Then, it allocate the buffer and receive content of message.
>>
>> However, at some time (say, after 150708 times of communication ), some
>> process would receive a wrong length(say 170 instead of 445) and the
>> process exits abnormally. Anyone has similar experience?
>>
>> On Thu, Sep 18, 2014 at 10:07 PM, XingFENG <xingf...@cse.unsw.edu.au
>> <mailto:xingf...@cse.unsw.edu.au>> wrote:
>>
>>     Thank you for your reply! I am still working on my codes. I would
>>     update the post when I fix bugs.
>>
>>     On Thu, Sep 18, 2014 at 9:48 PM, Nick Papior Andersen
>>     <nickpap...@gmail.com <mailto:nickpap...@gmail.com>> wrote:
>>
>>         I just checked, if the tests return "Received" for all messages
>>         it will not go into memory burst.
>>         Hence doing MPI_Test will be enough. :)
>>
>>         Hence, if at any time the mpi-layer is notified about the
>>         success of a send/recv it will clean up. This makes sense :)
>>
>>         See the updated code.
>>
>>         2014-09-18 13:39 GMT+02:00 Tobias Kloeffel
>>         <tobias.kloef...@fau.de <mailto:tobias.kloef...@fau.de>>:
>>
>>             ok i have to wait until tomorrow, they have some problems
>>             with the network...
>>
>>
>>
>>
>>             On 09/18/2014 01:27 PM, Nick Papior Andersen wrote:
>>
>>>             I am not sure whether test will cover this... You should
>>>             check it...
>>>
>>>
>>>             I here attach my example script which shows two working
>>>             cases, and one not workning (you can check the memory
>>>             usage simultaneously and see that the first two works, the
>>>             last one goes ballistic in memory).
>>>
>>>             Just check it with test to see if it works...
>>>
>>>
>>>             2014-09-18 13:20 GMT+02:00 XingFENG
>>>             <xingf...@cse.unsw.edu.au <mailto:xingf...@cse.unsw.edu.au
>>> >>:
>>>
>>>                 Thanks very much for your reply!
>>>
>>>                 To Sir Jeff Squyres:
>>>
>>>                 I think it fails due to truncation errors. I am now
>>>                 logging information of each send and receive to find
>>>                 out the reason.
>>>
>>>
>>>
>>>
>>>                 To Sir Nick Papior Andersen:
>>>
>>>                 Oh, wait (mpi_wait) is never called in my codes.
>>>
>>>                 What I do is to call MPI_Irecv once. Then MPI_Test is
>>>                 called several times to check whether new messages are
>>>                 available. If new messages are available, some
>>>                 functions to process these messages are called.
>>>
>>>                 I will add the wait function and check the running
>>>                 results.
>>>
>>>                 On Thu, Sep 18, 2014 at 8:47 PM, Nick Papior Andersen
>>>                 <nickpap...@gmail.com <mailto:nickpap...@gmail.com>>
>>>                 wrote:
>>>
>>>                     In complement to Jeff, I would add that using
>>>                     asynchronous messages REQUIRES that you wait
>>>                     (mpi_wait) for all messages at some point. Even
>>>                     though this might not seem obvious it is due to
>>>                     memory allocation "behind the scenes" which are
>>>                     only de-allocated upon completion through a wait
>>>                     statement.
>>>
>>>
>>>                     2014-09-18 12:36 GMT+02:00 Jeff Squyres (jsquyres)
>>>                     <jsquy...@cisco.com <mailto:jsquy...@cisco.com>>:
>>>
>>>                         On Sep 18, 2014, at 2:43 AM, XingFENG
>>>                         <xingf...@cse.unsw.edu.au
>>>                         <mailto:xingf...@cse.unsw.edu.au>> wrote:
>>>
>>>                         > a. How to get more information about errors?
>>>                         I got errors like below. This says that
>>>                         program exited abnormally in function
>>>                         MPI_Test(). But is there a way to know more
>>>                         about the error?
>>>                         >
>>>                         > *** An error occurred in MPI_Test
>>>                         > *** on communicator MPI_COMM_WORLD
>>>                         > *** MPI_ERR_TRUNCATE: message truncated
>>>                         > *** MPI_ERRORS_ARE_FATAL: your MPI job will
>>>                         now abort
>>>
>>>                         For the purpose of this discussion, let's take
>>>                         a simplification that you are sending and
>>>                         receiving the same datatypes (e.g., you're
>>>                         sending MPI_INT and you're receiving MPI_INT).
>>>
>>>                         This error means that you tried to receive
>>>                         message with too small a buffer.
>>>
>>>                         Specifically, MPI says that if you send a
>>>                         message that is X element long (e.g., 20
>>>                         MPI_INTs), then the matching receive must be Y
>>>                         elements, where Y>=X (e.g., *at least* 20
>>>                         MPI_INTs).  If the receiver provides a Y where
>>>                         Y<X, this is a truncation error.
>>>
>>>                         Unfortunately, Open MPI doesn't report a whole
>>>                         lot more information about these kinds of
>>>                         errors than what you're seeing, sorry.
>>>
>>>                         > b. Are there anything to note about
>>>                         asynchronous communication? I use MPI_Isend,
>>>                         MPI_Irecv, MPI_Test to implement asynchronous
>>>                         communication. My program works well on small
>>>                         data sets(10K nodes graphs), but it exits
>>>                         abnormally on large data set (1M nodes graphs).
>>>
>>>                         Is it failing due to truncation errors, or
>>>                         something else?
>>>
>>>                         --
>>>                         Jeff Squyres
>>>                         jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>>>                         For corporate legal information go to:
>>>                         http://www.cisco.com/web/
>>> about/doing_business/legal/cri/
>>>
>>>                         _______________________________________________
>>>                         users mailing list
>>>                         us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>                         Subscription:
>>>                         http://www.open-mpi.org/
>>> mailman/listinfo.cgi/users
>>>                         Link to this post:
>>>                         http://www.open-mpi.org/
>>> community/lists/users/2014/09/25344.php
>>>
>>>
>>>
>>>
>>>                     --
>>>                     Kind regards Nick
>>>
>>>                     _______________________________________________
>>>                     users mailing list
>>>                     us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>                     Subscription:
>>>                     http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>                     Link to this post:
>>>                     http://www.open-mpi.org/
>>> community/lists/users/2014/09/25345.php
>>>
>>>
>>>
>>>
>>>                 --
>>>                 Best Regards.
>>>
>>>                 _______________________________________________
>>>                 users mailing list
>>>                 us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>                 Subscription:
>>>                 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>                 Link to this post:
>>>                 http://www.open-mpi.org/community/lists/users/2014/09/
>>> 25346.php
>>>
>>>
>>>
>>>
>>>             --
>>>             Kind regards Nick
>>>
>>>
>>>             _______________________________________________ users
>>>             mailing list us...@open-mpi.org
>>>             <mailto:us...@open-mpi.org> Subscription:
>>>             http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>             Link to this post:http://www.open-mpi.org/
>>> community/lists/users/2014/09/25347.php
>>>
>>
>>
>>             _______________________________________________
>>             users mailing list
>>             us...@open-mpi.org <mailto:us...@open-mpi.org>
>>             Subscription: http://www.open-mpi.org/
>> mailman/listinfo.cgi/users
>>             Link to this post:
>>             http://www.open-mpi.org/community/lists/users/2014/09/
>> 25348.php
>>
>>
>>
>>
>>         --
>>         Kind regards Nick
>>
>>         _______________________________________________
>>         users mailing list
>>         us...@open-mpi.org <mailto:us...@open-mpi.org>
>>         Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>         Link to this post:
>>         http://www.open-mpi.org/community/lists/users/2014/09/25349.php
>>
>>
>>
>>
>>     --
>>     Best Regards.
>>
>>
>>
>>
>> --
>> Best Regards.
>> ---
>> Xing FENG
>> PhD Candidate
>> Database Research Group
>>
>> School of Computer Science and Engineering
>> University of New South Wales
>> NSW 2052, Sydney
>>
>> Phone: (+61) 413 857 288
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/
>> 25354.php
>>
>>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/
> 25357.php
>



-- 
Best Regards.
---
Xing FENG
PhD Candidate
Database Research Group

School of Computer Science and Engineering
University of New South Wales
NSW 2052, Sydney

Phone: (+61) 413 857 288

Reply via email to