Thanks for the info, I updated https://github.com/open-mpi/ompi/issues/1809 accordingly.
fwiw, the bug occurs when addresses do not fit in 32 bits. for some reasons, I always run into it on OSX but not on Linux, ubless I use dmalloc. I replaced malloc with alloca (and remove free) so I always hit the bug on Linux. Cheers, Gilles On Wednesday, June 22, 2016, Nicolas Joly <nj...@pasteur.fr> wrote: > On Wed, Jun 22, 2016 at 11:58:25AM +0900, Gilles Gouaillardet wrote: > > Nicolas, > > > > can you please give the attached patch a try ? > > > > in my environment, it fixes your test case. > > Yes ! It does here too ... > > Just patched ADIOI_NFS_WriteStrided() using the same fix. And the > original tool that crashed first on read, and later on write with > MPI_BOTTOM now succeed. > > > based on previous tests posted here, it is likely a similar bug should > > be fixed for other filesystems. > > Thanks a lot. > > > Gilles > > > > > > On 6/15/2016 12:42 AM, Nicolas Joly wrote: > > >Hi, > > > > > >At work, i do have some mpi codes that make use of custom datatypes to > > >call MPI_File_read with MPI_BOTTOM ... It mostly works, except when > > >the underlying filesystem is NFS where if crash with SIGSEGV. > > > > > >The attached sample (code + data) works just fine with 1.10.1 on my > > >NetBSD/amd64 workstation using the UFS romio backend, but crash if > > >switched to NFS : > > > > > >njoly@issan [~]> mpirun --version > > >mpirun (Open MPI) 1.10.1 > > >njoly@issan [~]> mpicc -g -Wall -o sample sample.c > > >njoly@issan [~]> mpirun -n 2 ./sample ufs:data.txt > > >rank1 ... 111111111133333333335555555555 > > >rank0 ... 000000000022222222224444444444 > > >njoly@issan [~]> mpirun -n 2 ./sample nfs:data.txt > > >[issan:20563] *** Process received signal *** > > >[issan:08879] *** Process received signal *** > > >[issan:20563] Signal: Segmentation fault (11) > > >[issan:20563] Signal code: Address not mapped (1) > > >[issan:20563] Failing at address: 0xffffffffb1309240 > > >[issan:08879] Signal: Segmentation fault (11) > > >[issan:08879] Signal code: Address not mapped (1) > > >[issan:08879] Failing at address: 0xffffffff881b0420 > > >[issan:08879] [ 0] [issan:20563] [ 0] 0x7dafb14a52b0 > > ><__sigtramp_siginfo_2> at /usr/lib/libc.so.12 > > >[issan:20563] *** End of error message *** > > >0x78b9886a52b0 <__sigtramp_siginfo_2> at /usr/lib/libc.so.12 > > >[issan:08879] *** End of error message *** > > > >-------------------------------------------------------------------------- > > >mpirun noticed that process rank 0 with PID 20563 on node issan exited > on > > >signal 11 (Segmentation fault). > > > >-------------------------------------------------------------------------- > > >njoly@issan [~]> gdb sample sample.core > > >GNU gdb (GDB) 7.10.1 > > >[...] > > >Core was generated by `sample'. > > >Program terminated with signal SIGSEGV, Segmentation fault. > > >#0 0x000078b98871971f in memcpy () from /usr/lib/libc.so.12 > > >[Current thread is 1 (LWP 1)] > > >(gdb) bt > > >#0 0x000078b98871971f in memcpy () from /usr/lib/libc.so.12 > > >#1 0x000078b974010edf in ADIOI_NFS_ReadStrided () from > > >/usr/pkg/lib/openmpi/mca_io_romio.so > > >#2 0x000078b97400bacf in MPIOI_File_read () from > > >/usr/pkg/lib/openmpi/mca_io_romio.so > > >#3 0x000078b97400bc72 in mca_io_romio_dist_MPI_File_read () from > > >/usr/pkg/lib/openmpi/mca_io_romio.so > > >#4 0x000078b988e72b38 in PMPI_File_read () from > /usr/pkg/lib/libmpi.so.12 > > >#5 0x00000000004013a4 in main (argc=2, argv=0x7f7fff7b0f00) at > sample.c:63 > > > > > >Thanks. > > > > > > > > > > > >_______________________________________________ > > >users mailing list > > >us...@open-mpi.org <javascript:;> > > >Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > >Link to this post: > > >http://www.open-mpi.org/community/lists/users/2016/06/29434.php > > > > > diff --git a/ompi/mca/io/romio/romio/adio/ad_nfs/ad_nfs_read.c > b/ompi/mca/io/romio/romio/adio/ad_nfs/ad_nfs_read.c > > index 16f3a4d..2577f13 100644 > > --- a/ompi/mca/io/romio/romio/adio/ad_nfs/ad_nfs_read.c > > +++ b/ompi/mca/io/romio/romio/adio/ad_nfs/ad_nfs_read.c > > @@ -457,13 +457,14 @@ void ADIOI_NFS_ReadStrided(ADIO_File fd, void > *buf, int count, > > } > > else { > > /* noncontiguous in memory as well as in file */ > > + ADIO_Offset i; > > > > ADIOI_Flatten_datatype(datatype); > > flat_buf = ADIOI_Flatlist; > > while (flat_buf->type != datatype) flat_buf = flat_buf->next; > > > > k = num = buf_count = 0; > > - i = (int) (flat_buf->indices[0]); > > + i = flat_buf->indices[0]; > > j = st_index; > > off = offset; > > n_filetypes = st_n_filetypes; > > @@ -508,8 +509,8 @@ void ADIOI_NFS_ReadStrided(ADIO_File fd, void *buf, > int count, > > > > k = (k + 1)%flat_buf->count; > > buf_count++; > > - i = (int) (buftype_extent*(buf_count/flat_buf->count) + > > - flat_buf->indices[k]); > > + i = buftype_extent*(buf_count/flat_buf->count) + > > + flat_buf->indices[k]; > > new_brd_size = flat_buf->blocklens[k]; > > if (size != frd_size) { > > off += size; > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <javascript:;> > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29494.php > > -- > Nicolas Joly > > Cluster & Computing Group > Biology IT Center > Institut Pasteur, Paris. > _______________________________________________ > users mailing list > us...@open-mpi.org <javascript:;> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29504.php >