There is no guarantee that the messages will be received in the same
order that they were sent.
Use tags or another mechanism to match the messages on send and recv ends.

On 09/18/2014 10:42 AM, XingFENG wrote:
I have found some thing strange.

Basically, in my codes, processes send and receive messages to/from
others with variable lengths asynchronously. When sending messages, a
process would first send the length of message and then the content of
the message. When receiving, a process would first receive the length.
Then, it allocate the buffer and receive content of message.

However, at some time (say, after 150708 times of communication ), some
process would receive a wrong length(say 170 instead of 445) and the
process exits abnormally. Anyone has similar experience?

On Thu, Sep 18, 2014 at 10:07 PM, XingFENG <xingf...@cse.unsw.edu.au
<mailto:xingf...@cse.unsw.edu.au>> wrote:

    Thank you for your reply! I am still working on my codes. I would
    update the post when I fix bugs.

    On Thu, Sep 18, 2014 at 9:48 PM, Nick Papior Andersen
    <nickpap...@gmail.com <mailto:nickpap...@gmail.com>> wrote:

        I just checked, if the tests return "Received" for all messages
        it will not go into memory burst.
        Hence doing MPI_Test will be enough. :)

        Hence, if at any time the mpi-layer is notified about the
        success of a send/recv it will clean up. This makes sense :)

        See the updated code.

        2014-09-18 13:39 GMT+02:00 Tobias Kloeffel
        <tobias.kloef...@fau.de <mailto:tobias.kloef...@fau.de>>:

            ok i have to wait until tomorrow, they have some problems
            with the network...




            On 09/18/2014 01:27 PM, Nick Papior Andersen wrote:
            I am not sure whether test will cover this... You should
            check it...


            I here attach my example script which shows two working
            cases, and one not workning (you can check the memory
            usage simultaneously and see that the first two works, the
            last one goes ballistic in memory).

            Just check it with test to see if it works...


            2014-09-18 13:20 GMT+02:00 XingFENG
            <xingf...@cse.unsw.edu.au <mailto:xingf...@cse.unsw.edu.au>>:

                Thanks very much for your reply!

                To Sir Jeff Squyres:

                I think it fails due to truncation errors. I am now
                logging information of each send and receive to find
                out the reason.




                To Sir Nick Papior Andersen:

                Oh, wait (mpi_wait) is never called in my codes.

                What I do is to call MPI_Irecv once. Then MPI_Test is
                called several times to check whether new messages are
                available. If new messages are available, some
                functions to process these messages are called.

                I will add the wait function and check the running
                results.

                On Thu, Sep 18, 2014 at 8:47 PM, Nick Papior Andersen
                <nickpap...@gmail.com <mailto:nickpap...@gmail.com>>
                wrote:

                    In complement to Jeff, I would add that using
                    asynchronous messages REQUIRES that you wait
                    (mpi_wait) for all messages at some point. Even
                    though this might not seem obvious it is due to
                    memory allocation "behind the scenes" which are
                    only de-allocated upon completion through a wait
                    statement.


                    2014-09-18 12:36 GMT+02:00 Jeff Squyres (jsquyres)
                    <jsquy...@cisco.com <mailto:jsquy...@cisco.com>>:

                        On Sep 18, 2014, at 2:43 AM, XingFENG
                        <xingf...@cse.unsw.edu.au
                        <mailto:xingf...@cse.unsw.edu.au>> wrote:

                        > a. How to get more information about errors?
                        I got errors like below. This says that
                        program exited abnormally in function
                        MPI_Test(). But is there a way to know more
                        about the error?
                        >
                        > *** An error occurred in MPI_Test
                        > *** on communicator MPI_COMM_WORLD
                        > *** MPI_ERR_TRUNCATE: message truncated
                        > *** MPI_ERRORS_ARE_FATAL: your MPI job will
                        now abort

                        For the purpose of this discussion, let's take
                        a simplification that you are sending and
                        receiving the same datatypes (e.g., you're
                        sending MPI_INT and you're receiving MPI_INT).

                        This error means that you tried to receive
                        message with too small a buffer.

                        Specifically, MPI says that if you send a
                        message that is X element long (e.g., 20
                        MPI_INTs), then the matching receive must be Y
                        elements, where Y>=X (e.g., *at least* 20
                        MPI_INTs).  If the receiver provides a Y where
                        Y<X, this is a truncation error.

                        Unfortunately, Open MPI doesn't report a whole
                        lot more information about these kinds of
                        errors than what you're seeing, sorry.

                        > b. Are there anything to note about
                        asynchronous communication? I use MPI_Isend,
                        MPI_Irecv, MPI_Test to implement asynchronous
                        communication. My program works well on small
                        data sets(10K nodes graphs), but it exits
                        abnormally on large data set (1M nodes graphs).

                        Is it failing due to truncation errors, or
                        something else?

                        --
                        Jeff Squyres
                        jsquy...@cisco.com <mailto:jsquy...@cisco.com>
                        For corporate legal information go to:
                        http://www.cisco.com/web/about/doing_business/legal/cri/

                        _______________________________________________
                        users mailing list
                        us...@open-mpi.org <mailto:us...@open-mpi.org>
                        Subscription:
                        http://www.open-mpi.org/mailman/listinfo.cgi/users
                        Link to this post:
                        
http://www.open-mpi.org/community/lists/users/2014/09/25344.php




                    --
                    Kind regards Nick

                    _______________________________________________
                    users mailing list
                    us...@open-mpi.org <mailto:us...@open-mpi.org>
                    Subscription:
                    http://www.open-mpi.org/mailman/listinfo.cgi/users
                    Link to this post:
                    
http://www.open-mpi.org/community/lists/users/2014/09/25345.php




                --
                Best Regards.

                _______________________________________________
                users mailing list
                us...@open-mpi.org <mailto:us...@open-mpi.org>
                Subscription:
                http://www.open-mpi.org/mailman/listinfo.cgi/users
                Link to this post:
                http://www.open-mpi.org/community/lists/users/2014/09/25346.php




            --
            Kind regards Nick


            _______________________________________________ users
            mailing list us...@open-mpi.org
            <mailto:us...@open-mpi.org> Subscription:
            http://www.open-mpi.org/mailman/listinfo.cgi/users

            Link to this 
post:http://www.open-mpi.org/community/lists/users/2014/09/25347.php


            _______________________________________________
            users mailing list
            us...@open-mpi.org <mailto:us...@open-mpi.org>
            Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
            Link to this post:
            http://www.open-mpi.org/community/lists/users/2014/09/25348.php




        --
        Kind regards Nick

        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2014/09/25349.php




    --
    Best Regards.




--
Best Regards.
---
Xing FENG
PhD Candidate
Database Research Group

School of Computer Science and Engineering
University of New South Wales
NSW 2052, Sydney

Phone: (+61) 413 857 288


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/09/25354.php


Reply via email to