I think the following paragraph might be useful. Its in MPI-3, page 142 lines 16-20:

"The type-matching conditions for the collective operations are more strict than the corresponding conditions between sender and receiver in point-to-point. Namely, for collective operations, the amount of data sent must exactly match the amount of data specified by the receiver. Different type maps (the layout in memory, see Section 4.1) between sender and receiver are still allowed".


Thanks
Edgar

On 4/8/2015 9:30 AM, Ralph Castain wrote:
In the interim, perhaps another way of addressing this would be to ask:
what happens when you run your reproducer with MPICH? Does that work?

This would at least tell us how another implementation interpreted that
function.


On Apr 7, 2015, at 10:18 AM, Ralph Castain <r...@open-mpi.org
<mailto:r...@open-mpi.org>> wrote:

I’m afraid we’ll have to get someone from the Forum to interpret
(Howard is a member as well), but here is what I see just below that,
in the description section:

/The type signature associated with sendcounts[j], sendtype at
process i must be equal to the type signature associated
with recvcounts[i], recvtype at process j. This implies that the
amount of data sent must be equal to the amount of data received,
pairwise between every pair of processes/


On Apr 7, 2015, at 9:56 AM, Hamidreza Anvari <hr.anv...@gmail.com
<mailto:hr.anv...@gmail.com>> wrote:

Hello,

Thanks for your description.
I'm currently doing allToAll() prior to allToAllV(), to communicate
length of expected messages.
.
BUT, I still strongly believe that the right implementation of this
method is something that I expected earlier!
If you check the MPI specification here:

http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
Page 170
Line 14

It is mentioned that "... the number of elements that CAN be
received...". which implies that the actual received message may have
shorter length.

While in cases where it is mandatory to have same value, the modal
"MUST" is used. for example at page 171 Line 1, it is mentioned that
"... sendtype at process i MUST be equal to the type signature ...".

SO, I would expect that any consistent implementation of MPI
specification handle this message length matching by itself, as I
asked originally.

Thanks,
-- HR

On Tue, Apr 7, 2015 at 6:03 AM, Howard Pritchard <hpprit...@gmail.com
<mailto:hpprit...@gmail.com>> wrote:

    Hi HR,

    Sorry for not noticing the receive side earlier, but as Ralph
    implied earlier
    in this thread, the MPI standard has more strict type matching
    for collectives
    than for point to point.  Namely, the number of bytes the
    receiver expects
    to receive from a given sender in the alltoallv must match the
    number of bytes
    sent by the sender.

    You were just getting lucky with the older open mpi.  The error
    message
    isn't so great though.  Its likely in the newer open mpi you are
    using a
    collective algorithm for alltoallv that assumes you're app is
    obeying the
    standard.

    You are correct that if the ranks don't know how much data will
    be sent
    to them from each rank prior to the alltoallv op, you will need
    to have some
    mechanism for exchanging this info prior to the alltoallv op.

    Howard


    2015-04-06 23:23 GMT-06:00 Hamidreza Anvari <hr.anv...@gmail.com
    <mailto:hr.anv...@gmail.com>>:

        Hello,

        If I set the size2 values according to your suggestion, which
        is the same values as on sending nodes, it works fine.
        But by definition it does not need to be exactly the same as
        the length of sent data, and it is just a maximum length of
        expected data to receive. If not, it is inevitable to run a
        allToAll() first to communicate the data sizes, and then
        doing the main allToAllV(), which is an expensive unnecessary
        communication overhead.

        I just created a reproducer in C++ which gives the error
        under OpenMPI 1.8.4, but runs correctly under OpenMPI 1.5.4 .
        (I've not included the Java version of this reproducer, which
        I think is not important as current version is enough to
        reproduce the error. But in case, it is straight forward to
        convert this code to Java).

        Thanks,
        -- HR

        On Mon, Apr 6, 2015 at 3:03 PM, Ralph Castain
        <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:

            That would imply that the issue is in the underlying C
            implementation in OMPI, not the Java bindings. The
            reproducer would definitely help pin it down.

            If you change the size2 values to the ones we sent you,
            does the program by chance work?


            On Apr 6, 2015, at 1:44 PM, Hamidreza Anvari
            <hr.anv...@gmail.com <mailto:hr.anv...@gmail.com>> wrote:

            I'll try that as well.
            Meanwhile, I found that my c++ code is running fine on a
            machine running OpenMPI 1.5.4, but I receive the same
            error under OpenMPI 1.8.4 for both Java and C++.

            On Mon, Apr 6, 2015 at 2:21 PM, Howard Pritchard
            <hpprit...@gmail.com <mailto:hpprit...@gmail.com>> wrote:

                Hello HR,

                Thanks!  If you have Java 1.7 installed on your
                system would you mind trying to test against that
                version too?

                Thanks,

                Howard


                2015-04-06 13:09 GMT-06:00 Hamidreza Anvari
                <hr.anv...@gmail.com <mailto:hr.anv...@gmail.com>>:

                    Hello,

                    1. I'm using Java/Javac version 1.8.0_20 under
                    OS X 10.10.2.

                    2. I have used the following configuration for
                    making OpenMPI:
                    ./configure --enable-mpi-java
                    
--with-jdk-bindir="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands"
                    
--with-jdk-headers="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers"
                    --prefix="/users/hamidreza/openmpi-1.8.4"

                    make all install

                    3. As a logical point of view, size2 is the
                    maximum expected data to receive, which in turn
                    might be less that this maximum.

                    4. I will try to prepare a working reproducer of
                    my error and send it to you.

                    Thanks,
                    -- HR

                    On Mon, Apr 6, 2015 at 10:46 AM, Ralph Castain
                    <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:

                        I’ve talked to the folks who wrote the Java
                        bindings. One possibility we identified is
                        that there may be an error in your code when
                        you did the translation

                        My immediate thought is that each process
                        can not receive more elements than it was
                        sent to them. That's the reason of
                        truncation error.

                        These are the correct values:

                        rank 0 - size2: 2,2,1,1
                        rank 1 - size2: 1,1,1,1
                        rank 2 - size2: 0,1,1,2
                        rank 3 - size2: 2,1,2,1

                        Can you check your code to see if perhaps
                        the values you are passing didn’t get
                        translated correctly from your C++ version
                        to the Java version?



                        On Apr 6, 2015, at 5:03 AM, Howard
                        Pritchard <hpprit...@gmail.com
                        <mailto:hpprit...@gmail.com>> wrote:

                        Hello HR,

                        It would also be useful to know which java
                        version you are using, as well
                        as the configure options used when building
                        open mpi.

                        Thanks,

                        Howard



                        2015-04-05 19:10 GMT-06:00 Ralph Castain
                        <r...@open-mpi.org <mailto:r...@open-mpi.org>>:

                            If not too much trouble, can you
                            extract just the alltoallv portion and
                            provide us with a small reproducer?


                            On Apr 5, 2015, at 12:11 PM, Hamidreza
                            Anvari <hr.anv...@gmail.com
                            <mailto:hr.anv...@gmail.com>> wrote:

                            Hello,

                            I am converting an existing MPI
                            program in C++ to Java using OpenMPI
                            1.8.4,
                            At some point I have a allToAllv()
                            code which works fine in C++ but
                            receives error in Java version:

                            MPI.COMM_WORLD.allToAllv(data,
                            subpartition_size,
                            subpartition_offset, MPI.INT
                            <http://mpi.int/>,
                            
data2,subpartition_size2,subpartition_offset2,MPI.INT
                            <http://mpi.int/>);

                            Error:
                            *** An error occurred in MPI_Alltoallv
                            *** reported by process
                            [3621322753,9223372036854775811]
                            *** on communicator MPI_COMM_WORLD
                            *** MPI_ERR_TRUNCATE: message truncated
                            *** MPI_ERRORS_ARE_FATAL (processes in
                            this communicator will now abort,
                            ***    and potentially your MPI job)
                            3 more processes have sent help
                            message help-mpi-errors.txt /
                            mpi_errors_are_fatal
                            Set MCA parameter
                            "orte_base_help_aggregate" to 0 to see
                            all help / error messages

                            Here are the values for parameters:

                            data.length = 5
                            data2.length = 20

                            ---------- Rank 0 of 4 ----------
                            subpartition_offset:0,2,3,3,
                            subpartition_size:2,1,0,2,
                            subpartition_offset2:0,5,10,15,
                            subpartition_size2:5,5,5,5,
                            ----------
                            ---------- Rank 1 of 4 ----------
                            subpartition_offset:0,2,3,4,
                            subpartition_size:2,1,1,1,
                            subpartition_offset2:0,5,10,15,
                            subpartition_size2:5,5,5,5,
                            ----------
                            ---------- Rank 2 of 4 ----------
                            subpartition_offset:0,1,2,3,
                            subpartition_size:1,1,1,2,
                            subpartition_offset2:0,5,10,15,
                            subpartition_size2:5,5,5,5,
                            ----------
                            ---------- Rank 3 of 4 ----------
                            subpartition_offset:0,1,2,4,
                            subpartition_size:1,1,2,1,
                            subpartition_offset2:0,5,10,15,
                            subpartition_size2:5,5,5,5,
                            ----------

                            Again, this is a code which works in
                            C++ version.

                            Any help or advice is greatly appreciated.

                            Thanks,
                            -- HR
                            _______________________________________________
                            users mailing list
                            us...@open-mpi.org
                            <mailto:us...@open-mpi.org>
                            Subscription:
                            http://www.open-mpi.org/mailman/listinfo.cgi/users
                            Link to this post:
                            
http://www.open-mpi.org/community/lists/users/2015/04/26610.php


                            _______________________________________________
                            users mailing list
                            us...@open-mpi.org
                            <mailto:us...@open-mpi.org>
                            Subscription:
                            http://www.open-mpi.org/mailman/listinfo.cgi/users
                            Link to this post:
                            
http://www.open-mpi.org/community/lists/users/2015/04/26613.php


                        _______________________________________________
                        users mailing list
                        us...@open-mpi.org <mailto:us...@open-mpi.org>
                        Subscription:
                        http://www.open-mpi.org/mailman/listinfo.cgi/users
                        Link to this post:
                        
http://www.open-mpi.org/community/lists/users/2015/04/26615.php


                        _______________________________________________
                        users mailing list
                        us...@open-mpi.org <mailto:us...@open-mpi.org>
                        Subscription:
                        http://www.open-mpi.org/mailman/listinfo.cgi/users
                        Link to this post:
                        
http://www.open-mpi.org/community/lists/users/2015/04/26616.php



                    _______________________________________________
                    users mailing list
                    us...@open-mpi.org <mailto:us...@open-mpi.org>
                    Subscription:
                    http://www.open-mpi.org/mailman/listinfo.cgi/users
                    Link to this post:
                    
http://www.open-mpi.org/community/lists/users/2015/04/26617.php



                _______________________________________________
                users mailing list
                us...@open-mpi.org <mailto:us...@open-mpi.org>
                Subscription:
                http://www.open-mpi.org/mailman/listinfo.cgi/users
                Link to this post:
                http://www.open-mpi.org/community/lists/users/2015/04/26620.php


            _______________________________________________
            users mailing list
            us...@open-mpi.org <mailto:us...@open-mpi.org>
            Subscription:
            http://www.open-mpi.org/mailman/listinfo.cgi/users
            Link to this post:
            http://www.open-mpi.org/community/lists/users/2015/04/26622.php


            _______________________________________________
            users mailing list
            us...@open-mpi.org <mailto:us...@open-mpi.org>
            Subscription:
            http://www.open-mpi.org/mailman/listinfo.cgi/users
            Link to this post:
            http://www.open-mpi.org/community/lists/users/2015/04/26623.php



        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2015/04/26631.php



    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2015/04/26634.php


_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/04/26637.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/04/26648.php


--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
--

Reply via email to