Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Llolsten Kaonga
Hello Adam, During the InfiniBand Plugfest 34 event last October, we found that mpirun hang on FDR systems if you run with the openib btl option. Yossi Itigin (@Mellanox) suggested that we run using the following options: --mca btl self,vader --mca pml ucx -x UCX_RC_PATH_MTU=4096

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Kawashima, Takahiro
Hello Adam, IMB had a bug related to Reduce_scatter. https://github.com/intel/mpi-benchmarks/pull/11 I'm not sure this bug is the cause but you can try the patch. https://github.com/intel/mpi-benchmarks/commit/841446d8cf4ca1f607c0f24b9a424ee39ee1f569 Thanks, Takahiro Kawashima, Fujitsu >

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-20 Thread Gilles Gouaillardet
Ryan, as Edgar explained, that could be a compiler issue (fwiw, I am unable to reproduce the bug) You can build Open MPI again and pass --disable-builtin-atomics to the configure command line. That being said, the "Alarm clock" message looks a bit suspicious. Does it always occur at 20+

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread George Bosilca
I was not able to reproduce the issue with openib on the 4.0, but instead I randomly segfault in MPI finalize during the grdma cleanup. I could however reproduce the TCP timeout part with both 4.0 and master, on a pretty sane cluster (only 3 interfaces, lo, eth0 and virbr0). With no surprise, the

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Adam LeBlanc
Hello Howard, Thanks for all of the help and suggestions I will look into them. I also realized that my ansible wasn't setup properly for handling tar files so the nightly build didn't even install, but will do it by hand and will give you an update tomorrow somewhere in the afternoon. Thanks, Ad

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Howard Pritchard
Hello Adam, This helps some. Could you post first 20 lines of you config.log. This will help in trying to reproduce. The content of your host file (you can use generic names for the nodes if that'a an issue to publicize) would also help as the number of nodes and number of MPI processes/node im

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Adam LeBlanc
On tcp side it doesn't seg fault anymore but will timeout on some tests but on the openib side it will still seg fault, here is the output: [pandora:19256] *** Process received signal *** [pandora:19256] Signal: Segmentation fault (11) [pandora:19256] Signal code: Address not mapped (1) [pandora:1

Re: [OMPI users] [Request for Cooperation] -- MPI International Survey

2019-02-20 Thread George Bosilca
George, Thanks for letting us know about this issue, it was a misconfiguration issue with the form. I guess we did not realized as most of us are automatically signed in by our browsers. Anyway, thanks for the feedback, the access to the form should now be completely open. Sorry for the inconveni

Re: [OMPI users] [Request for Cooperation] -- MPI International Survey

2019-02-20 Thread George Reeke
On Wed, 2019-02-20 at 13:21 -0500, George Bosilca wrote: > To obtain representative samples of the MPI community, we have > prepared a survey > > https://docs.google.com/forms/d/e/1FAIpQLSd1bDppVODc8nB0BjIXdqSCO_MuEuNAAbBixl4onTchwSQFwg/viewform > To access the survey, I was asked to create a g

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Jeff Squyres (jsquyres) via users
Can you try the latest 4.0.x nightly snapshot and see if the problem still occurs? https://www.open-mpi.org/nightly/v4.0.x/ > On Feb 20, 2019, at 1:40 PM, Adam LeBlanc wrote: > > I do here is the output: > > 2 total processes killed (some possibly by mpirun during cleanup) > [pandora:122

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-20 Thread Ryan Novosielski
This is what I did for my build — not much going on there: ../openmpi-3.1.3/configure --prefix=/opt/sw/packages/gcc-4_8/openmpi/3.1.3 --with-pmi && \ make -j32 We have a mixture of types of Infiniband, using the RHEL-supplied Infiniband packages. -- || \\UTGERS, |-

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-20 Thread Gabriel, Edgar
Well, the way you describe it, it sounds to me like maybe an atomic issue with this compiler version. What was your configure line of Open MPI, and what network interconnect are you using? An easy way to test this theory would be to force OpenMPI to use the tcp interfaces (everything will be sl

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Adam LeBlanc
I do here is the output: 2 total processes killed (some possibly by mpirun during cleanup) [pandora:12238] *** Process received signal *** [pandora:12238] Signal: Segmentation fault (11) [pandora:12238] Signal code: Invalid permissions (2) [pandora:12238] Failing at address: 0x7f5c8e31fff0 [pandor

[OMPI users] [Request for Cooperation] -- MPI International Survey

2019-02-20 Thread George Bosilca
Dear colleagues, As part of a wide-ranging effort to understand the current usage of the Message Passing Interface (MPI) in the development of parallel applications and to drive future additions to the MPI standard, an international team is seeking feedback from the largest possible MPI audience (

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Howard Pritchard
HI Adam, As a sanity check, if you try to use --mca btl self,vader,tcp do you still see the segmentation fault? Howard Am Mi., 20. Feb. 2019 um 08:50 Uhr schrieb Adam LeBlanc < alebl...@iol.unh.edu>: > Hello, > > When I do a run with OpenMPI v4.0.0 on Infiniband with this command: > mpirun --

Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI 3.1.3

2019-02-20 Thread Ryan Novosielski
Does it make any sense that it seems to work fine when OpenMPI and HDF5 are built with GCC 7.4 and GCC 8.2, but /not/ when they are built with RHEL-supplied GCC 4.8.5? That appears to be the scenario. For the GCC 4.8.5 build, I did try an XFS filesystem and it didn’t help. GPFS works fine for e

[OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread Adam LeBlanc
Hello, When I do a run with OpenMPI v4.0.0 on Infiniband with this command: mpirun --mca btl_openib_warn_no_device_params_found 0 --map-by node --mca orte_base_help_aggregate 0 --mca btl openib,vader,self --mca pml ob1 --mca btl_openib_allow_ib 1 -np 6 -hostfile /home/aleblanc/ib-mpi-hosts IMB-MP