Well, if it is the next message then I guess you have a bug, your counter is not consistent. I am pretty sure the error is on your side, I do something similar but have never experienced anything like that. :)
2014-09-19 3:21 GMT+02:00 XingFENG <xingf...@cse.unsw.edu.au>: > Thanks for your advice. I added tags for messages in ascending order. But > it didn't work, either. > > For example, after 103043 times of communication, in the sender side, it > sends an int 78 with tag 206086, followed by 78 bytes data with tag 206087. > In the receiver side, it receives an int 41 with tag 206086. ( actually, 41 > is the length of the next message to be sent by sender ) > Hence, it allocates a buffer with length 41. However, there are 78 bytes > data. Hence, it exits with error MPI_ERR_TRUNCATE: message truncated. > > > > On Fri, Sep 19, 2014 at 1:55 AM, Gus Correa <g...@ldeo.columbia.edu> wrote: > >> There is no guarantee that the messages will be received in the same >> order that they were sent. >> Use tags or another mechanism to match the messages on send and recv ends. >> >> On 09/18/2014 10:42 AM, XingFENG wrote: >> >>> I have found some thing strange. >>> >>> Basically, in my codes, processes send and receive messages to/from >>> others with variable lengths asynchronously. When sending messages, a >>> process would first send the length of message and then the content of >>> the message. When receiving, a process would first receive the length. >>> Then, it allocate the buffer and receive content of message. >>> >>> However, at some time (say, after 150708 times of communication ), some >>> process would receive a wrong length(say 170 instead of 445) and the >>> process exits abnormally. Anyone has similar experience? >>> >>> On Thu, Sep 18, 2014 at 10:07 PM, XingFENG <xingf...@cse.unsw.edu.au >>> <mailto:xingf...@cse.unsw.edu.au>> wrote: >>> >>> Thank you for your reply! I am still working on my codes. I would >>> update the post when I fix bugs. >>> >>> On Thu, Sep 18, 2014 at 9:48 PM, Nick Papior Andersen >>> <nickpap...@gmail.com <mailto:nickpap...@gmail.com>> wrote: >>> >>> I just checked, if the tests return "Received" for all messages >>> it will not go into memory burst. >>> Hence doing MPI_Test will be enough. :) >>> >>> Hence, if at any time the mpi-layer is notified about the >>> success of a send/recv it will clean up. This makes sense :) >>> >>> See the updated code. >>> >>> 2014-09-18 13:39 GMT+02:00 Tobias Kloeffel >>> <tobias.kloef...@fau.de <mailto:tobias.kloef...@fau.de>>: >>> >>> ok i have to wait until tomorrow, they have some problems >>> with the network... >>> >>> >>> >>> >>> On 09/18/2014 01:27 PM, Nick Papior Andersen wrote: >>> >>>> I am not sure whether test will cover this... You should >>>> check it... >>>> >>>> >>>> I here attach my example script which shows two working >>>> cases, and one not workning (you can check the memory >>>> usage simultaneously and see that the first two works, the >>>> last one goes ballistic in memory). >>>> >>>> Just check it with test to see if it works... >>>> >>>> >>>> 2014-09-18 13:20 GMT+02:00 XingFENG >>>> <xingf...@cse.unsw.edu.au <mailto:xingf...@cse.unsw.edu.au >>>> >>: >>>> >>>> Thanks very much for your reply! >>>> >>>> To Sir Jeff Squyres: >>>> >>>> I think it fails due to truncation errors. I am now >>>> logging information of each send and receive to find >>>> out the reason. >>>> >>>> >>>> >>>> >>>> To Sir Nick Papior Andersen: >>>> >>>> Oh, wait (mpi_wait) is never called in my codes. >>>> >>>> What I do is to call MPI_Irecv once. Then MPI_Test is >>>> called several times to check whether new messages are >>>> available. If new messages are available, some >>>> functions to process these messages are called. >>>> >>>> I will add the wait function and check the running >>>> results. >>>> >>>> On Thu, Sep 18, 2014 at 8:47 PM, Nick Papior Andersen >>>> <nickpap...@gmail.com <mailto:nickpap...@gmail.com>> >>>> wrote: >>>> >>>> In complement to Jeff, I would add that using >>>> asynchronous messages REQUIRES that you wait >>>> (mpi_wait) for all messages at some point. Even >>>> though this might not seem obvious it is due to >>>> memory allocation "behind the scenes" which are >>>> only de-allocated upon completion through a wait >>>> statement. >>>> >>>> >>>> 2014-09-18 12:36 GMT+02:00 Jeff Squyres (jsquyres) >>>> <jsquy...@cisco.com <mailto:jsquy...@cisco.com>>: >>>> >>>> On Sep 18, 2014, at 2:43 AM, XingFENG >>>> <xingf...@cse.unsw.edu.au >>>> <mailto:xingf...@cse.unsw.edu.au>> wrote: >>>> >>>> > a. How to get more information about errors? >>>> I got errors like below. This says that >>>> program exited abnormally in function >>>> MPI_Test(). But is there a way to know more >>>> about the error? >>>> > >>>> > *** An error occurred in MPI_Test >>>> > *** on communicator MPI_COMM_WORLD >>>> > *** MPI_ERR_TRUNCATE: message truncated >>>> > *** MPI_ERRORS_ARE_FATAL: your MPI job will >>>> now abort >>>> >>>> For the purpose of this discussion, let's take >>>> a simplification that you are sending and >>>> receiving the same datatypes (e.g., you're >>>> sending MPI_INT and you're receiving MPI_INT). >>>> >>>> This error means that you tried to receive >>>> message with too small a buffer. >>>> >>>> Specifically, MPI says that if you send a >>>> message that is X element long (e.g., 20 >>>> MPI_INTs), then the matching receive must be Y >>>> elements, where Y>=X (e.g., *at least* 20 >>>> MPI_INTs). If the receiver provides a Y where >>>> Y<X, this is a truncation error. >>>> >>>> Unfortunately, Open MPI doesn't report a whole >>>> lot more information about these kinds of >>>> errors than what you're seeing, sorry. >>>> >>>> > b. Are there anything to note about >>>> asynchronous communication? I use MPI_Isend, >>>> MPI_Irecv, MPI_Test to implement asynchronous >>>> communication. My program works well on small >>>> data sets(10K nodes graphs), but it exits >>>> abnormally on large data set (1M nodes graphs). >>>> >>>> Is it failing due to truncation errors, or >>>> something else? >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com <mailto:jsquy...@cisco.com> >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/ >>>> about/doing_business/legal/cri/ >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: >>>> http://www.open-mpi.org/ >>>> mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/ >>>> community/lists/users/2014/09/25344.php >>>> >>>> >>>> >>>> >>>> -- >>>> Kind regards Nick >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/ >>>> community/lists/users/2014/09/25345.php >>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards. >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/09/ >>>> 25346.php >>>> >>>> >>>> >>>> >>>> -- >>>> Kind regards Nick >>>> >>>> >>>> _______________________________________________ users >>>> mailing list us...@open-mpi.org >>>> <mailto:us...@open-mpi.org> Subscription: >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> Link to this post:http://www.open-mpi.org/ >>>> community/lists/users/2014/09/25347.php >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/ >>> mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/09/ >>> 25348.php >>> >>> >>> >>> >>> -- >>> Kind regards Nick >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/09/25349.php >>> >>> >>> >>> >>> -- >>> Best Regards. >>> >>> >>> >>> >>> -- >>> Best Regards. >>> --- >>> Xing FENG >>> PhD Candidate >>> Database Research Group >>> >>> School of Computer Science and Engineering >>> University of New South Wales >>> NSW 2052, Sydney >>> >>> Phone: (+61) 413 857 288 >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: http://www.open-mpi.org/ >>> community/lists/users/2014/09/25354.php >>> >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/ >> 25357.php >> > > > > -- > Best Regards. > --- > Xing FENG > PhD Candidate > Database Research Group > > School of Computer Science and Engineering > University of New South Wales > NSW 2052, Sydney > > Phone: (+61) 413 857 288 > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25359.php > -- Kind regards Nick