In https://github.com/apache/bookkeeper/issues/3751, it report a bug: The
bookkeeper client read operation timeout.
After our research, we found that the bookkeeper client ignore the response
from bookie server. (See https://github.com/apache/bookkeeper/issues/3751
to get the detailed).

The problem is introduced by https://github.com/apache/bookkeeper/pull/3528.
It change the ReadResponse, which is used by V2 Protocol. When config the
netty memory detect policy upper than
ResourceLeakDetector.setLevel(Level.DISABLED), the bookkeeper client will
ignore the response from bookie server, so the read operation timeout.

The https://github.com/apache/bookkeeper/pull/3528 all CI tests succeed and
merged. The Test `Client Tests` shouldn't pass in theory, but it pass.
So we research the root case.
Finnaly, we found probelm which lead the ci pass.
In bookkeeper, we use custom
org.apache.bookkeeper.common.allocator.impl.ByteBufAllocatorImpl to
allocate ByteBuf, the memory detect policy config by
`org.apache.bookkeeper.conf.AbstractConfiguration#setAllocatorLeakDetectionPolicy`,
the netty jvm param `-Dio.netty.leakDetection.level` didn't work, it will
be overridden by our custom config.
In our tests, we config `-Dio.netty.leakDetection.level=paranoid`, but it
will be overridden by `disabled`.

So I suggest in our tests, both the clientConf and the serverConfig should
config
`AbstractConfiguration#setAllocatorLeakDetectionPolicy(LeakDetectionPolicy.Paranoid)`.
It can cover more cases.

Reply via email to