Very latest news: I have narrowed the problem to ResponseEnDecoderV3#encode, using UnpooledByteBufAllocator.DEFAULT instead of the allocator from the channel the error disappear.
So the problem is about the encoding of the responses, using Java 9 and Pooled Byte Bufs. This is compatible with the errors on the client side about corrupted responses in case of Client on Java8 and Server on Java9. I am now doing tests with Bookie on Java 8 and Clients on Java 9 and the problem seems the same, I receive corrupted messages on Bookie. Does any ring bell ? What is the difference in Channel#write/ByteBuf pooling.....in Java 9 ? Enrico 2018-03-15 5:21 GMT+01:00 Enrico Olivelli <eolive...@gmail.com>: > Latest findings, some good news, and some very bad. > > Good news: > I was wrong, I did not switch back the system to Java 8 correcly. > > The problem is on Bookie side and occours only if the bookie in on Java 9. > > Bad news: > I have a fix. The fix to use Unpooled ByteBufs in serializeProtobuf: > > private static ByteBuf serializeProtobuf(MessageLite msg, ByteBufAllocator > allocator) { > int size = msg.getSerializedSize(); > ByteBuf buf = Unpooled.buffer(size, size); > ... > > I will continue to track down to the cause, I think it is on the read-path > (not sure). > > On client side we have a flag to not use pooled ByteBufs on Channel > Allocator, the most trivial fix at the moment is to make the same on Bookie > side as an hotfix for branch 4.6. > > Before jumping to this extreme hotfix solution I will dig into the issue, > now that I know that the problem is ONLY on Java 9 and on the Bookie it > will be simpler to find a reproducer. > > It remains the point that in other systems I have and in test cases there > is no failure. > > Honestly I have no Java 9 bookie in production, only Java 8 bookies, maybe > this is the motivation of the fact that no one ever reported this problem > from production > > Enrico > > > > > 2018-03-14 17:27 GMT+01:00 Ivan Kelly <iv...@apache.org>: > >> >> > @Ivan >> >> > I wonder if some tests on Jepsen with bookie restarts may find this >> kind >> >> of >> >> > issues, given that it is not a network/SO problem >> >> If jepsen can catch then normal integration test can. >> >> I attempted a repro for this using the integration test stuff. >> Running for 2-3 hours in a loop, no bug hit. Perhaps I'm not doing >> exactly what you are doing. >> >> https://github.com/ivankelly/bookkeeper/blob/enrico-bug/test >> s/integration/enrico-bug/src/test/java/org/apache/bookkeepe >> r/tests/integration/TestEnricoBug.java >> >> -Ivan >> > >