Re: [OMPI users] Deadlock in netcdf tests

2019-10-26 Thread Orion Poplawski via users
Orion Poplawski via users Sent: Friday, October 25, 2019 10:21 PM To: Open MPI Users Cc: Orion Poplawski Subject: Re: [OMPI users] Deadlock in netcdf tests Thanks for the response, the workaround helps. With that out of the way I see: + mpiexec -n 4 ./tst_parallel4 Error in ompi_io_ompio_c

Re: [OMPI users] Deadlock in netcdf tests

2019-10-26 Thread Gabriel, Edgar via users
a users > Sent: Friday, October 25, 2019 10:21 PM > To: Open MPI Users > Cc: Orion Poplawski > Subject: Re: [OMPI users] Deadlock in netcdf tests > > Thanks for the response, the workaround helps. > > With that out of the way I see: > > + mpiexec -n 4 ./tst_paralle

Re: [OMPI users] Deadlock in netcdf tests

2019-10-25 Thread Orion Poplawski via users
Thanks for the response, the workaround helps. With that out of the way I see: + mpiexec -n 4 ./tst_parallel4 Error in ompi_io_ompio_calcl_aggregator():rank_index(-2) >= num_aggregators(1)fd_size=461172966257152 off=4156705856 Error in ompi_io_ompio_calcl_aggregator():rank_index(-2) >= num

Re: [OMPI users] Deadlock in netcdf tests

2019-10-25 Thread Gabriel, Edgar via users
rs > Sent: Friday, October 25, 2019 7:43 AM > To: Open MPI Users > Cc: Gabriel, Edgar > Subject: Re: [OMPI users] Deadlock in netcdf tests > > Orion, > I will look into this problem, is there a specific code or testcase that > triggers > this problem? > Than

Re: [OMPI users] Deadlock in netcdf tests

2019-10-25 Thread Gabriel, Edgar via users
1:56 PM > To: Open MPI Users > Cc: Orion Poplawski > Subject: Re: [OMPI users] Deadlock in netcdf tests > > On 10/24/19 9:28 PM, Orion Poplawski via users wrote: > > Starting with netcdf 4.7.1 (and 4.7.2) in Fedora Rawhide we are seeing a > > test hang with openmpi 4.0

Re: [OMPI users] Deadlock in netcdf tests

2019-10-25 Thread Gilles Gouaillardet via users
Orion, thanks for the report. I can confirm this is indeed an Open MPI bug. FWIW, a workaround is to disable the fcoll/vulcan component. That can be achieved by mpirun --mca fcoll ^vulcan ... or OMPI_MCA_fcoll=^vulcan mpirun ... I also noted the tst_parallel3 program crashes with the RO

Re: [OMPI users] Deadlock in netcdf tests

2019-10-24 Thread Orion Poplawski via users
On 10/24/19 9:28 PM, Orion Poplawski via users wrote: Starting with netcdf 4.7.1 (and 4.7.2) in Fedora Rawhide we are seeing a test hang with openmpi 4.0.2.  Backtrace: (gdb) bt #0  0x7f90c197529b in sched_yield () from /lib64/libc.so.6 #1  0x7f90c1ac8a05 in ompi_request_default_wait ()