On 08/11/2014 08:54 AM, George Bosilca wrote:
The patch related to ticket #4597 is zapping only the datatypes where
the user explicitly provided a zero count.

We can argue about LB and UB, but I have a hard time understanding the
rationale of allowing zero count only for LB and UB. If it is required
by the standard we can easily support it (the line in the patch has to
move a little down in the code).

ROMIO's type flattening code is primitive: the zero-length blocks for UB and LB were the only way to encode the extent of the type, without calling back into the MPI implementation's type-inquiry routines.


*I* don't care how OpenMPI deals with UB and LB. It was *you* who suggested one might need to look a bit more closely at how OpenMPI type processing handles those markers:

http://www.open-mpi.org/community/lists/users/2014/05/24325.php

==rob


   George.



On Mon, Aug 11, 2014 at 9:44 AM, Rob Latham <r...@mcs.anl.gov
<mailto:r...@mcs.anl.gov>> wrote:



    On 08/10/2014 07:32 PM, Mohamad Chaarawi wrote:

        Update:

        George suggested that I try with the 1.8.2 rc3 and that one
        resolves the
        hindexed_block segfault that I was seeing with ompi. the I/O
        part now
        works with ompio, but needs the patches from Rob in ROMIO to
        work correctly.

        The 2nd issue with collective I/O where some processes
        participate with
        0 sized datatypes created with hindexed and hvector, is still
        unresolved.


    I think this ticket was closed a bit too early:

    https://svn.open-mpi.org/trac/__ompi/ticket/4597
    <https://svn.open-mpi.org/trac/ompi/ticket/4597>

    I don't know OpenMPI's type processing at all, but if it's like
    ROMIO, you cannot simply zap blocks of zero length:  some zero
    length blocks indicate upper bound and lower bound.

    or maybe it's totally unrelated.  There were a flurry of datatype
    bugs reported against both MPICH and OpenMPI in may of this year and
    I am sure I am confusing several issues.

    ==rob


        Thanks,
        Mohamad

        On 8/6/2014 11:50 AM, Mohamad Chaarawi wrote:

            Hi all,

            I'm seeing some problems with dervided datatype construction
            and I/O
            with OpenMPI 1.8.1.

            I have replicated them in the attached program.
            The first issue is that MPI_Type_create_hindexed___block()
            always
            sefgaults. Usage of this routine is commented out in the
            program. (I
            have a separate email thread with George and Edgar about this).

            The other issue is a segfault in MPI_File_set_view, when I
            have ranks
             > 0 creating the derived datatypes with count 0, and rank 0
            creating a
            derived datatype of count NUM_BLOCKS. If I use
            MPI_Type_contiguous to
            create the 0 sized file and memory datatypes (instead of
            hindexed and
            hvector) it works fine.
            To replicate, run the program with 2 or more procs:

            mpirun -np 2 ./hindexed_io mpi_test_file

            [jam:15566] *** Process received signal ***
            [jam:15566] Signal: Segmentation fault (11)
            [jam:15566] Signal code: Address not mapped (1)
            [jam:15566] Failing at address: (nil)
            [jam:15566] [ 0] [0xfcd440]
            [jam:15566] [ 1]
            
/scr/chaarawi/install/ompi/__lib/libmpi.so.1(ADIOI_Flatten___datatype+0x17a)[0xc80f2a]
            [jam:15566] [ 2]
            
/scr/chaarawi/install/ompi/__lib/libmpi.so.1(ADIO_Set_view+__0x1c1)[0xc72a6d]
            [jam:15566] [ 3]
            
/scr/chaarawi/install/ompi/__lib/libmpi.so.1(mca_io_romio___dist_MPI_File_set_view+0x69b)[__0xc8d11b]
            [jam:15566] [ 4]
            
/scr/chaarawi/install/ompi/__lib/libmpi.so.1(mca_io_romio___file_set_view+0x7c)[0xc4f7c5]
            [jam:15566] [ 5]
            
/scr/chaarawi/install/ompi/__lib/libmpi.so.1(PMPI_File_set___view+0x1e6)[0xb32f7e]
            [jam:15566] [ 6] ./hindexed_io[0x8048aa6]
            [jam:15566] [ 7]
            /lib/libc.so.6(__libc_start___main+0xdc)[0x7d5ebc]
            [jam:15566] [ 8] ./hindexed_io[0x80487e1]
            [jam:15566] *** End of error message ***

            If I use --mca io ompio with 2 or more procs, the program
            segfaults in
            write_at_all (regardless of what routine is used to
            construct a 0
            sized datatype):

            [jam:15687] *** Process received signal ***
            [jam:15687] Signal: Floating point exception (8)
            [jam:15687] Signal code: Integer divide-by-zero (1)
            [jam:15687] Failing at address: 0x3e29b7
            [jam:15687] [ 0] [0xe56440]
            [jam:15687] [ 1]
            
/scr/chaarawi/install/ompi/__lib/libmpi.so.1(ompi_io_ompio___set_explicit_offset+0x9d)[__0x3513bc]
            [jam:15687] [ 2]
            
/scr/chaarawi/install/ompi/__lib/libmpi.so.1(ompio_io___ompio_file_write_at_all+0x3e)[__0x35869a]
            [jam:15687] [ 3]
            
/scr/chaarawi/install/ompi/__lib/libmpi.so.1(mca_io_ompio___file_write_at_all+0x66)[__0x358650]
            [jam:15687] [ 4]
            
/scr/chaarawi/install/ompi/__lib/libmpi.so.1(MPI_File___write_at_all+0x1b3)[0x1f46f3]
            [jam:15687] [ 5] ./hindexed_io[0x8048b07]
            [jam:15687] [ 6]
            /lib/libc.so.6(__libc_start___main+0xdc)[0x7d5ebc]
            [jam:15687] [ 7] ./hindexed_io[0x80487e1]
            [jam:15687] *** End of error message ***

            If I use mpich 3.1.2 , I don't see those issues.

            Thanks,
            Mohamad


            _________________________________________________
            users mailing list
            us...@open-mpi.org <mailto:us...@open-mpi.org>
            Subscription:http://www.open-__mpi.org/mailman/listinfo.cgi/__users
            <http://www.open-mpi.org/mailman/listinfo.cgi/users>
            Link to this
            
post:http://www.open-mpi.org/__community/lists/users/2014/08/__24931.php
            <http://www.open-mpi.org/community/lists/users/2014/08/24931.php>




        _________________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription:
        http://www.open-mpi.org/__mailman/listinfo.cgi/users
        <http://www.open-mpi.org/mailman/listinfo.cgi/users>
        Link to this post:
        http://www.open-mpi.org/__community/lists/users/2014/08/__24963.php
        <http://www.open-mpi.org/community/lists/users/2014/08/24963.php>


    --
    Rob Latham
    Mathematics and Computer Science Division
    Argonne National Lab, IL USA
    _________________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/__mailman/listinfo.cgi/users
    <http://www.open-mpi.org/mailman/listinfo.cgi/users>
    Link to this post:
    http://www.open-mpi.org/__community/lists/users/2014/08/__24971.php
    <http://www.open-mpi.org/community/lists/users/2014/08/24971.php>




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/08/24973.php


--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Reply via email to