Well, if it is the next message then I guess you have a bug, your counter
is not consistent.
I am pretty sure the error is on your side, I do something similar but have
never experienced anything like that. :)

2014-09-19 3:21 GMT+02:00 XingFENG <xingf...@cse.unsw.edu.au>:

> Thanks for your advice. I added tags for messages in ascending order. But
> it didn't work, either.
>
> For example, after 103043 times of communication, in the sender side, it
> sends an int 78 with tag 206086, followed by 78 bytes data with tag 206087.
> In the receiver side, it receives an int 41 with tag 206086. ( actually, 41
> is the length of the next message to be sent by sender )
> Hence, it allocates a buffer with length 41. However, there are 78 bytes
> data. Hence, it exits with error MPI_ERR_TRUNCATE: message truncated.
>
>
>
> On Fri, Sep 19, 2014 at 1:55 AM, Gus Correa <g...@ldeo.columbia.edu> wrote:
>
>> There is no guarantee that the messages will be received in the same
>> order that they were sent.
>> Use tags or another mechanism to match the messages on send and recv ends.
>>
>> On 09/18/2014 10:42 AM, XingFENG wrote:
>>
>>> I have found some thing strange.
>>>
>>> Basically, in my codes, processes send and receive messages to/from
>>> others with variable lengths asynchronously. When sending messages, a
>>> process would first send the length of message and then the content of
>>> the message. When receiving, a process would first receive the length.
>>> Then, it allocate the buffer and receive content of message.
>>>
>>> However, at some time (say, after 150708 times of communication ), some
>>> process would receive a wrong length(say 170 instead of 445) and the
>>> process exits abnormally. Anyone has similar experience?
>>>
>>> On Thu, Sep 18, 2014 at 10:07 PM, XingFENG <xingf...@cse.unsw.edu.au
>>> <mailto:xingf...@cse.unsw.edu.au>> wrote:
>>>
>>>     Thank you for your reply! I am still working on my codes. I would
>>>     update the post when I fix bugs.
>>>
>>>     On Thu, Sep 18, 2014 at 9:48 PM, Nick Papior Andersen
>>>     <nickpap...@gmail.com <mailto:nickpap...@gmail.com>> wrote:
>>>
>>>         I just checked, if the tests return "Received" for all messages
>>>         it will not go into memory burst.
>>>         Hence doing MPI_Test will be enough. :)
>>>
>>>         Hence, if at any time the mpi-layer is notified about the
>>>         success of a send/recv it will clean up. This makes sense :)
>>>
>>>         See the updated code.
>>>
>>>         2014-09-18 13:39 GMT+02:00 Tobias Kloeffel
>>>         <tobias.kloef...@fau.de <mailto:tobias.kloef...@fau.de>>:
>>>
>>>             ok i have to wait until tomorrow, they have some problems
>>>             with the network...
>>>
>>>
>>>
>>>
>>>             On 09/18/2014 01:27 PM, Nick Papior Andersen wrote:
>>>
>>>>             I am not sure whether test will cover this... You should
>>>>             check it...
>>>>
>>>>
>>>>             I here attach my example script which shows two working
>>>>             cases, and one not workning (you can check the memory
>>>>             usage simultaneously and see that the first two works, the
>>>>             last one goes ballistic in memory).
>>>>
>>>>             Just check it with test to see if it works...
>>>>
>>>>
>>>>             2014-09-18 13:20 GMT+02:00 XingFENG
>>>>             <xingf...@cse.unsw.edu.au <mailto:xingf...@cse.unsw.edu.au
>>>> >>:
>>>>
>>>>                 Thanks very much for your reply!
>>>>
>>>>                 To Sir Jeff Squyres:
>>>>
>>>>                 I think it fails due to truncation errors. I am now
>>>>                 logging information of each send and receive to find
>>>>                 out the reason.
>>>>
>>>>
>>>>
>>>>
>>>>                 To Sir Nick Papior Andersen:
>>>>
>>>>                 Oh, wait (mpi_wait) is never called in my codes.
>>>>
>>>>                 What I do is to call MPI_Irecv once. Then MPI_Test is
>>>>                 called several times to check whether new messages are
>>>>                 available. If new messages are available, some
>>>>                 functions to process these messages are called.
>>>>
>>>>                 I will add the wait function and check the running
>>>>                 results.
>>>>
>>>>                 On Thu, Sep 18, 2014 at 8:47 PM, Nick Papior Andersen
>>>>                 <nickpap...@gmail.com <mailto:nickpap...@gmail.com>>
>>>>                 wrote:
>>>>
>>>>                     In complement to Jeff, I would add that using
>>>>                     asynchronous messages REQUIRES that you wait
>>>>                     (mpi_wait) for all messages at some point. Even
>>>>                     though this might not seem obvious it is due to
>>>>                     memory allocation "behind the scenes" which are
>>>>                     only de-allocated upon completion through a wait
>>>>                     statement.
>>>>
>>>>
>>>>                     2014-09-18 12:36 GMT+02:00 Jeff Squyres (jsquyres)
>>>>                     <jsquy...@cisco.com <mailto:jsquy...@cisco.com>>:
>>>>
>>>>                         On Sep 18, 2014, at 2:43 AM, XingFENG
>>>>                         <xingf...@cse.unsw.edu.au
>>>>                         <mailto:xingf...@cse.unsw.edu.au>> wrote:
>>>>
>>>>                         > a. How to get more information about errors?
>>>>                         I got errors like below. This says that
>>>>                         program exited abnormally in function
>>>>                         MPI_Test(). But is there a way to know more
>>>>                         about the error?
>>>>                         >
>>>>                         > *** An error occurred in MPI_Test
>>>>                         > *** on communicator MPI_COMM_WORLD
>>>>                         > *** MPI_ERR_TRUNCATE: message truncated
>>>>                         > *** MPI_ERRORS_ARE_FATAL: your MPI job will
>>>>                         now abort
>>>>
>>>>                         For the purpose of this discussion, let's take
>>>>                         a simplification that you are sending and
>>>>                         receiving the same datatypes (e.g., you're
>>>>                         sending MPI_INT and you're receiving MPI_INT).
>>>>
>>>>                         This error means that you tried to receive
>>>>                         message with too small a buffer.
>>>>
>>>>                         Specifically, MPI says that if you send a
>>>>                         message that is X element long (e.g., 20
>>>>                         MPI_INTs), then the matching receive must be Y
>>>>                         elements, where Y>=X (e.g., *at least* 20
>>>>                         MPI_INTs).  If the receiver provides a Y where
>>>>                         Y<X, this is a truncation error.
>>>>
>>>>                         Unfortunately, Open MPI doesn't report a whole
>>>>                         lot more information about these kinds of
>>>>                         errors than what you're seeing, sorry.
>>>>
>>>>                         > b. Are there anything to note about
>>>>                         asynchronous communication? I use MPI_Isend,
>>>>                         MPI_Irecv, MPI_Test to implement asynchronous
>>>>                         communication. My program works well on small
>>>>                         data sets(10K nodes graphs), but it exits
>>>>                         abnormally on large data set (1M nodes graphs).
>>>>
>>>>                         Is it failing due to truncation errors, or
>>>>                         something else?
>>>>
>>>>                         --
>>>>                         Jeff Squyres
>>>>                         jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>>>>                         For corporate legal information go to:
>>>>                         http://www.cisco.com/web/
>>>> about/doing_business/legal/cri/
>>>>
>>>>                         _______________________________________________
>>>>                         users mailing list
>>>>                         us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>                         Subscription:
>>>>                         http://www.open-mpi.org/
>>>> mailman/listinfo.cgi/users
>>>>                         Link to this post:
>>>>                         http://www.open-mpi.org/
>>>> community/lists/users/2014/09/25344.php
>>>>
>>>>
>>>>
>>>>
>>>>                     --
>>>>                     Kind regards Nick
>>>>
>>>>                     _______________________________________________
>>>>                     users mailing list
>>>>                     us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>                     Subscription:
>>>>                     http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>                     Link to this post:
>>>>                     http://www.open-mpi.org/
>>>> community/lists/users/2014/09/25345.php
>>>>
>>>>
>>>>
>>>>
>>>>                 --
>>>>                 Best Regards.
>>>>
>>>>                 _______________________________________________
>>>>                 users mailing list
>>>>                 us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>                 Subscription:
>>>>                 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>                 Link to this post:
>>>>                 http://www.open-mpi.org/community/lists/users/2014/09/
>>>> 25346.php
>>>>
>>>>
>>>>
>>>>
>>>>             --
>>>>             Kind regards Nick
>>>>
>>>>
>>>>             _______________________________________________ users
>>>>             mailing list us...@open-mpi.org
>>>>             <mailto:us...@open-mpi.org> Subscription:
>>>>             http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>             Link to this post:http://www.open-mpi.org/
>>>> community/lists/users/2014/09/25347.php
>>>>
>>>
>>>
>>>             _______________________________________________
>>>             users mailing list
>>>             us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>             Subscription: http://www.open-mpi.org/
>>> mailman/listinfo.cgi/users
>>>             Link to this post:
>>>             http://www.open-mpi.org/community/lists/users/2014/09/
>>> 25348.php
>>>
>>>
>>>
>>>
>>>         --
>>>         Kind regards Nick
>>>
>>>         _______________________________________________
>>>         users mailing list
>>>         us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>         Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>         Link to this post:
>>>         http://www.open-mpi.org/community/lists/users/2014/09/25349.php
>>>
>>>
>>>
>>>
>>>     --
>>>     Best Regards.
>>>
>>>
>>>
>>>
>>> --
>>> Best Regards.
>>> ---
>>> Xing FENG
>>> PhD Candidate
>>> Database Research Group
>>>
>>> School of Computer Science and Engineering
>>> University of New South Wales
>>> NSW 2052, Sydney
>>>
>>> Phone: (+61) 413 857 288
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: http://www.open-mpi.org/
>>> community/lists/users/2014/09/25354.php
>>>
>>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/
>> 25357.php
>>
>
>
>
> --
> Best Regards.
> ---
> Xing FENG
> PhD Candidate
> Database Research Group
>
> School of Computer Science and Engineering
> University of New South Wales
> NSW 2052, Sydney
>
> Phone: (+61) 413 857 288
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/09/25359.php
>



-- 
Kind regards Nick

Reply via email to