Romio is imported from a not update mpich.
Could you give the latest mpich a try  ?

That will be helpful to figure out whether this bug has already been fixed.

Cheers,

Gilles

Nicolas Joly <nj...@pasteur.fr> wrote:
>On Fri, Jun 17, 2016 at 10:15:28AM +0200, Vincent Huber wrote:
>> Dear Mr. Joly,
>> 
>> 
>> I have tried your code on my MacBook Pro (cf. infra for details) to detail
>> that behavior.
>
>Thanks for testing.
>
>> Looking at openmpi-1.10.3/ompi/mca/io/romio/romio/adio/comon/ad_fstype.c to
>> get the list of file system I can test, I have tried the following:
>> 
>> mpirun -np 2 ./sample ufs:data.txt
>
>Works.
>
>> mpirun -np 2 ./sample nfs:data.txt
>
>Crash with SIGSEGV in ADIOI_NFS_ReadStrided()
>
>Made a quick and dirty test by replacing ADIOI_NFS_ReadStrided by
>ADIOI_GEN_ReadStrided() in ADIO_NFS_operations structure
>(ad_nfs/ad_nfs.c) ... and this fixed the problem.
>
>> mpirun -np 2 ./sample pfs:data.txt
>> mpirun -np 2 ./sample piofs:data.txt
>> mpirun -np 2 ./sample panfs:data.txt
>> mpirun -np 2 ./sample hfs:data.txt
>> mpirun -np 2 ./sample xfs:data.txt
>> mpirun -np 2 ./sample sfs:data.txt
>> mpirun -np 2 ./sample pvfs:data.txt
>> mpirun -np 2 ./sample zoidfs:data.txt
>> mpirun -np 2 ./sample ftp:data.txt
>> mpirun -np 2 ./sample lustre:data.txt
>> mpirun -np 2 ./sample bgl:data.txt
>> mpirun -np 2 ./sample bglockless:data.txt
>
>I don't have access to this filesystems ...  The tool fails when
>trying to open the file, that's the corresponding assert that fire.
>
>> mpirun -np 2 ./sample testfs:data.txt
>
>This one crash with SIGSEGV but in ADIOI_Flatten().
>
>I also tried with ompio and it seems to work.
>
>mpirun --mca io ompio -np 2 ./sample data.txt
>rank0 ... 000000000022222222224444444444
>rank1 ... 111111111133333333335555555555
>
>> The only one to not crash is ufs.
>> That is not the answer you are looking for but my two cents?
>
>Thanks.
>
>>  gcc --version
>> Configured with:
>> --prefix=/Applications/Xcode.app/Contents/Developer/usr
>> --with-gxx-include-dir=/usr/include/c++/4.2.1
>> Apple LLVM version 7.0.0 (clang-700.0.72)
>> Target: x86_64-apple-darwin15.5.0
>> Thread model: posix
>> 
>> 
>> et
>> 
>> 
>> mpirun --version
>> mpirun (Open MPI) 1.10.2
>> 
>> 
>> ?
>> 
>> 
>> 2016-06-14 17:42 GMT+02:00 Nicolas Joly <nj...@pasteur.fr>:
>> 
>> 
>> >
>> > Hi,
>> >
>> > At work, i do have some mpi codes that make use of custom datatypes to
>> > call MPI_File_read with MPI_BOTTOM ... It mostly works, except when
>> > the underlying filesystem is NFS where if crash with SIGSEGV.
>> >
>> > The attached sample (code + data) works just fine with 1.10.1 on my
>> > NetBSD/amd64 workstation using the UFS romio backend, but crash if
>> > switched to NFS :
>> >
>> > njoly@issan [~]> mpirun --version
>> > mpirun (Open MPI) 1.10.1
>> > njoly@issan [~]> mpicc -g -Wall -o sample sample.c
>> > njoly@issan [~]> mpirun -n 2 ./sample ufs:data.txt
>> > rank1 ... 111111111133333333335555555555
>> > rank0 ... 000000000022222222224444444444
>> > njoly@issan [~]> mpirun -n 2 ./sample nfs:data.txt
>> > [issan:20563] *** Process received signal ***
>> > [issan:08879] *** Process received signal ***
>> > [issan:20563] Signal: Segmentation fault (11)
>> > [issan:20563] Signal code: Address not mapped (1)
>> > [issan:20563] Failing at address: 0xffffffffb1309240
>> > [issan:08879] Signal: Segmentation fault (11)
>> > [issan:08879] Signal code: Address not mapped (1)
>> > [issan:08879] Failing at address: 0xffffffff881b0420
>> > [issan:08879] [ 0] [issan:20563] [ 0] 0x7dafb14a52b0
>> > <__sigtramp_siginfo_2> at /usr/lib/libc.so.12
>> > [issan:20563] *** End of error message ***
>> > 0x78b9886a52b0 <__sigtramp_siginfo_2> at /usr/lib/libc.so.12
>> > [issan:08879] *** End of error message ***
>> > --------------------------------------------------------------------------
>> > mpirun noticed that process rank 0 with PID 20563 on node issan exited on
>> > signal 11 (Segmentation fault).
>> > --------------------------------------------------------------------------
>> > njoly@issan [~]> gdb sample sample.core
>> > GNU gdb (GDB) 7.10.1
>> > [...]
>> > Core was generated by `sample'.
>> > Program terminated with signal SIGSEGV, Segmentation fault.
>> > #0  0x000078b98871971f in memcpy () from /usr/lib/libc.so.12
>> > [Current thread is 1 (LWP 1)]
>> > (gdb) bt
>> > #0  0x000078b98871971f in memcpy () from /usr/lib/libc.so.12
>> > #1  0x000078b974010edf in ADIOI_NFS_ReadStrided () from
>> > /usr/pkg/lib/openmpi/mca_io_romio.so
>> > #2  0x000078b97400bacf in MPIOI_File_read () from
>> > /usr/pkg/lib/openmpi/mca_io_romio.so
>> > #3  0x000078b97400bc72 in mca_io_romio_dist_MPI_File_read () from
>> > /usr/pkg/lib/openmpi/mca_io_romio.so
>> > #4  0x000078b988e72b38 in PMPI_File_read () from /usr/pkg/lib/libmpi.so.12
>> > #5  0x00000000004013a4 in main (argc=2, argv=0x7f7fff7b0f00) at sample.c:63
>> >
>> > Thanks.
>> >
>> > --
>> > Nicolas Joly
>> >
>> > Cluster & Computing Group
>> > Biology IT Center
>> > Institut Pasteur, Paris.
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post:
>> > http://www.open-mpi.org/community/lists/users/2016/06/29434.php
>> >
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Docteur Ingénieur de recherche
>> CeMoSiS <http://www.cemosis.fr> - vincent.hu...@cemosis.fr
>> Tel: +33 (0)3 68 8*5 02 06*
>> IRMA - 7, rue René Descartes
>> 67 000 Strasbourg
>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/06/29476.php
>
>-- 
>Nicolas Joly
>
>Cluster & Computing Group
>Biology IT Center
>Institut Pasteur, Paris.
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2016/06/29477.php

Reply via email to