Orion
Poplawski via users
Sent: Friday, October 25, 2019 10:21 PM
To: Open MPI Users
Cc: Orion Poplawski
Subject: Re: [OMPI users] Deadlock in netcdf tests
Thanks for the response, the workaround helps.
With that out of the way I see:
+ mpiexec -n 4 ./tst_parallel4
Error in ompi_io_ompio_c
a users
> Sent: Friday, October 25, 2019 10:21 PM
> To: Open MPI Users
> Cc: Orion Poplawski
> Subject: Re: [OMPI users] Deadlock in netcdf tests
>
> Thanks for the response, the workaround helps.
>
> With that out of the way I see:
>
> + mpiexec -n 4 ./tst_paralle
Thanks for the response, the workaround helps.
With that out of the way I see:
+ mpiexec -n 4 ./tst_parallel4
Error in ompi_io_ompio_calcl_aggregator():rank_index(-2) >=
num_aggregators(1)fd_size=461172966257152 off=4156705856
Error in ompi_io_ompio_calcl_aggregator():rank_index(-2) >=
num
rs
> Sent: Friday, October 25, 2019 7:43 AM
> To: Open MPI Users
> Cc: Gabriel, Edgar
> Subject: Re: [OMPI users] Deadlock in netcdf tests
>
> Orion,
> I will look into this problem, is there a specific code or testcase that
> triggers
> this problem?
> Than
1:56 PM
> To: Open MPI Users
> Cc: Orion Poplawski
> Subject: Re: [OMPI users] Deadlock in netcdf tests
>
> On 10/24/19 9:28 PM, Orion Poplawski via users wrote:
> > Starting with netcdf 4.7.1 (and 4.7.2) in Fedora Rawhide we are seeing a
> > test hang with openmpi 4.0
Orion,
thanks for the report.
I can confirm this is indeed an Open MPI bug.
FWIW, a workaround is to disable the fcoll/vulcan component.
That can be achieved by
mpirun --mca fcoll ^vulcan ...
or
OMPI_MCA_fcoll=^vulcan mpirun ...
I also noted the tst_parallel3 program crashes with the RO
On 10/24/19 9:28 PM, Orion Poplawski via users wrote:
Starting with netcdf 4.7.1 (and 4.7.2) in Fedora Rawhide we are seeing a
test hang with openmpi 4.0.2. Backtrace:
(gdb) bt
#0 0x7f90c197529b in sched_yield () from /lib64/libc.so.6
#1 0x7f90c1ac8a05 in ompi_request_default_wait ()
Starting with netcdf 4.7.1 (and 4.7.2) in Fedora Rawhide we are seeing a
test hang with openmpi 4.0.2. Backtrace:
(gdb) bt
#0 0x7f90c197529b in sched_yield () from /lib64/libc.so.6
#1 0x7f90c1ac8a05 in ompi_request_default_wait () from
/usr/lib64/openmpi/lib/libmpi.so.40
#2 0x7f