The patch related to ticket #4597 is zapping only the datatypes where the
user explicitly provided a zero count.

We can argue about LB and UB, but I have a hard time understanding the
rationale of allowing zero count only for LB and UB. If it is required by
the standard we can easily support it (the line in the patch has to move a
little down in the code).

  George.



On Mon, Aug 11, 2014 at 9:44 AM, Rob Latham <r...@mcs.anl.gov> wrote:

>
>
> On 08/10/2014 07:32 PM, Mohamad Chaarawi wrote:
>
>> Update:
>>
>> George suggested that I try with the 1.8.2 rc3 and that one resolves the
>> hindexed_block segfault that I was seeing with ompi. the I/O part now
>> works with ompio, but needs the patches from Rob in ROMIO to work
>> correctly.
>>
>> The 2nd issue with collective I/O where some processes participate with
>> 0 sized datatypes created with hindexed and hvector, is still unresolved.
>>
>
> I think this ticket was closed a bit too early:
>
> https://svn.open-mpi.org/trac/ompi/ticket/4597
>
> I don't know OpenMPI's type processing at all, but if it's like ROMIO, you
> cannot simply zap blocks of zero length:  some zero length blocks indicate
> upper bound and lower bound.
>
> or maybe it's totally unrelated.  There were a flurry of datatype bugs
> reported against both MPICH and OpenMPI in may of this year and I am sure I
> am confusing several issues.
>
> ==rob
>
>
>> Thanks,
>> Mohamad
>>
>> On 8/6/2014 11:50 AM, Mohamad Chaarawi wrote:
>>
>>> Hi all,
>>>
>>> I'm seeing some problems with dervided datatype construction and I/O
>>> with OpenMPI 1.8.1.
>>>
>>> I have replicated them in the attached program.
>>> The first issue is that MPI_Type_create_hindexed_block() always
>>> sefgaults. Usage of this routine is commented out in the program. (I
>>> have a separate email thread with George and Edgar about this).
>>>
>>> The other issue is a segfault in MPI_File_set_view, when I have ranks
>>> > 0 creating the derived datatypes with count 0, and rank 0 creating a
>>> derived datatype of count NUM_BLOCKS. If I use MPI_Type_contiguous to
>>> create the 0 sized file and memory datatypes (instead of hindexed and
>>> hvector) it works fine.
>>> To replicate, run the program with 2 or more procs:
>>>
>>> mpirun -np 2 ./hindexed_io mpi_test_file
>>>
>>> [jam:15566] *** Process received signal ***
>>> [jam:15566] Signal: Segmentation fault (11)
>>> [jam:15566] Signal code: Address not mapped (1)
>>> [jam:15566] Failing at address: (nil)
>>> [jam:15566] [ 0] [0xfcd440]
>>> [jam:15566] [ 1]
>>> /scr/chaarawi/install/ompi/lib/libmpi.so.1(ADIOI_Flatten_
>>> datatype+0x17a)[0xc80f2a]
>>> [jam:15566] [ 2]
>>> /scr/chaarawi/install/ompi/lib/libmpi.so.1(ADIO_Set_view+
>>> 0x1c1)[0xc72a6d]
>>> [jam:15566] [ 3]
>>> /scr/chaarawi/install/ompi/lib/libmpi.so.1(mca_io_romio_
>>> dist_MPI_File_set_view+0x69b)[0xc8d11b]
>>> [jam:15566] [ 4]
>>> /scr/chaarawi/install/ompi/lib/libmpi.so.1(mca_io_romio_
>>> file_set_view+0x7c)[0xc4f7c5]
>>> [jam:15566] [ 5]
>>> /scr/chaarawi/install/ompi/lib/libmpi.so.1(PMPI_File_set_
>>> view+0x1e6)[0xb32f7e]
>>> [jam:15566] [ 6] ./hindexed_io[0x8048aa6]
>>> [jam:15566] [ 7] /lib/libc.so.6(__libc_start_main+0xdc)[0x7d5ebc]
>>> [jam:15566] [ 8] ./hindexed_io[0x80487e1]
>>> [jam:15566] *** End of error message ***
>>>
>>> If I use --mca io ompio with 2 or more procs, the program segfaults in
>>> write_at_all (regardless of what routine is used to construct a 0
>>> sized datatype):
>>>
>>> [jam:15687] *** Process received signal ***
>>> [jam:15687] Signal: Floating point exception (8)
>>> [jam:15687] Signal code: Integer divide-by-zero (1)
>>> [jam:15687] Failing at address: 0x3e29b7
>>> [jam:15687] [ 0] [0xe56440]
>>> [jam:15687] [ 1]
>>> /scr/chaarawi/install/ompi/lib/libmpi.so.1(ompi_io_ompio_
>>> set_explicit_offset+0x9d)[0x3513bc]
>>> [jam:15687] [ 2]
>>> /scr/chaarawi/install/ompi/lib/libmpi.so.1(ompio_io_
>>> ompio_file_write_at_all+0x3e)[0x35869a]
>>> [jam:15687] [ 3]
>>> /scr/chaarawi/install/ompi/lib/libmpi.so.1(mca_io_ompio_
>>> file_write_at_all+0x66)[0x358650]
>>> [jam:15687] [ 4]
>>> /scr/chaarawi/install/ompi/lib/libmpi.so.1(MPI_File_
>>> write_at_all+0x1b3)[0x1f46f3]
>>> [jam:15687] [ 5] ./hindexed_io[0x8048b07]
>>> [jam:15687] [ 6] /lib/libc.so.6(__libc_start_main+0xdc)[0x7d5ebc]
>>> [jam:15687] [ 7] ./hindexed_io[0x80487e1]
>>> [jam:15687] *** End of error message ***
>>>
>>> If I use mpich 3.1.2 , I don't see those issues.
>>>
>>> Thanks,
>>> Mohamad
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:http://www.open-mpi.org/community/lists/users/2014/08/
>>> 24931.php
>>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/
>> 24963.php
>>
>>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/
> 24971.php
>

Reply via email to