On 08/11/2014 08:54 AM, George Bosilca wrote:
The patch related to ticket #4597 is zapping only the datatypes where the user explicitly provided a zero count. We can argue about LB and UB, but I have a hard time understanding the rationale of allowing zero count only for LB and UB. If it is required by the standard we can easily support it (the line in the patch has to move a little down in the code).
ROMIO's type flattening code is primitive: the zero-length blocks for UB and LB were the only way to encode the extent of the type, without calling back into the MPI implementation's type-inquiry routines.
*I* don't care how OpenMPI deals with UB and LB. It was *you* who suggested one might need to look a bit more closely at how OpenMPI type processing handles those markers:
http://www.open-mpi.org/community/lists/users/2014/05/24325.php ==rob
George. On Mon, Aug 11, 2014 at 9:44 AM, Rob Latham <r...@mcs.anl.gov <mailto:r...@mcs.anl.gov>> wrote: On 08/10/2014 07:32 PM, Mohamad Chaarawi wrote: Update: George suggested that I try with the 1.8.2 rc3 and that one resolves the hindexed_block segfault that I was seeing with ompi. the I/O part now works with ompio, but needs the patches from Rob in ROMIO to work correctly. The 2nd issue with collective I/O where some processes participate with 0 sized datatypes created with hindexed and hvector, is still unresolved. I think this ticket was closed a bit too early: https://svn.open-mpi.org/trac/__ompi/ticket/4597 <https://svn.open-mpi.org/trac/ompi/ticket/4597> I don't know OpenMPI's type processing at all, but if it's like ROMIO, you cannot simply zap blocks of zero length: some zero length blocks indicate upper bound and lower bound. or maybe it's totally unrelated. There were a flurry of datatype bugs reported against both MPICH and OpenMPI in may of this year and I am sure I am confusing several issues. ==rob Thanks, Mohamad On 8/6/2014 11:50 AM, Mohamad Chaarawi wrote: Hi all, I'm seeing some problems with dervided datatype construction and I/O with OpenMPI 1.8.1. I have replicated them in the attached program. The first issue is that MPI_Type_create_hindexed___block() always sefgaults. Usage of this routine is commented out in the program. (I have a separate email thread with George and Edgar about this). The other issue is a segfault in MPI_File_set_view, when I have ranks > 0 creating the derived datatypes with count 0, and rank 0 creating a derived datatype of count NUM_BLOCKS. If I use MPI_Type_contiguous to create the 0 sized file and memory datatypes (instead of hindexed and hvector) it works fine. To replicate, run the program with 2 or more procs: mpirun -np 2 ./hindexed_io mpi_test_file [jam:15566] *** Process received signal *** [jam:15566] Signal: Segmentation fault (11) [jam:15566] Signal code: Address not mapped (1) [jam:15566] Failing at address: (nil) [jam:15566] [ 0] [0xfcd440] [jam:15566] [ 1] /scr/chaarawi/install/ompi/__lib/libmpi.so.1(ADIOI_Flatten___datatype+0x17a)[0xc80f2a] [jam:15566] [ 2] /scr/chaarawi/install/ompi/__lib/libmpi.so.1(ADIO_Set_view+__0x1c1)[0xc72a6d] [jam:15566] [ 3] /scr/chaarawi/install/ompi/__lib/libmpi.so.1(mca_io_romio___dist_MPI_File_set_view+0x69b)[__0xc8d11b] [jam:15566] [ 4] /scr/chaarawi/install/ompi/__lib/libmpi.so.1(mca_io_romio___file_set_view+0x7c)[0xc4f7c5] [jam:15566] [ 5] /scr/chaarawi/install/ompi/__lib/libmpi.so.1(PMPI_File_set___view+0x1e6)[0xb32f7e] [jam:15566] [ 6] ./hindexed_io[0x8048aa6] [jam:15566] [ 7] /lib/libc.so.6(__libc_start___main+0xdc)[0x7d5ebc] [jam:15566] [ 8] ./hindexed_io[0x80487e1] [jam:15566] *** End of error message *** If I use --mca io ompio with 2 or more procs, the program segfaults in write_at_all (regardless of what routine is used to construct a 0 sized datatype): [jam:15687] *** Process received signal *** [jam:15687] Signal: Floating point exception (8) [jam:15687] Signal code: Integer divide-by-zero (1) [jam:15687] Failing at address: 0x3e29b7 [jam:15687] [ 0] [0xe56440] [jam:15687] [ 1] /scr/chaarawi/install/ompi/__lib/libmpi.so.1(ompi_io_ompio___set_explicit_offset+0x9d)[__0x3513bc] [jam:15687] [ 2] /scr/chaarawi/install/ompi/__lib/libmpi.so.1(ompio_io___ompio_file_write_at_all+0x3e)[__0x35869a] [jam:15687] [ 3] /scr/chaarawi/install/ompi/__lib/libmpi.so.1(mca_io_ompio___file_write_at_all+0x66)[__0x358650] [jam:15687] [ 4] /scr/chaarawi/install/ompi/__lib/libmpi.so.1(MPI_File___write_at_all+0x1b3)[0x1f46f3] [jam:15687] [ 5] ./hindexed_io[0x8048b07] [jam:15687] [ 6] /lib/libc.so.6(__libc_start___main+0xdc)[0x7d5ebc] [jam:15687] [ 7] ./hindexed_io[0x80487e1] [jam:15687] *** End of error message *** If I use mpich 3.1.2 , I don't see those issues. Thanks, Mohamad _________________________________________________ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription:http://www.open-__mpi.org/mailman/listinfo.cgi/__users <http://www.open-mpi.org/mailman/listinfo.cgi/users> Link to this post:http://www.open-mpi.org/__community/lists/users/2014/08/__24931.php <http://www.open-mpi.org/community/lists/users/2014/08/24931.php> _________________________________________________ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/__mailman/listinfo.cgi/users <http://www.open-mpi.org/mailman/listinfo.cgi/users> Link to this post: http://www.open-mpi.org/__community/lists/users/2014/08/__24963.php <http://www.open-mpi.org/community/lists/users/2014/08/24963.php> -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA _________________________________________________ users mailing list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/__mailman/listinfo.cgi/users <http://www.open-mpi.org/mailman/listinfo.cgi/users> Link to this post: http://www.open-mpi.org/__community/lists/users/2014/08/__24971.php <http://www.open-mpi.org/community/lists/users/2014/08/24971.php> _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/24973.php
-- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA