By my testing using the above steps to reproduce the issue, I do not see the bug on commit [0] but I do see it on the subsequent commit [1].
That indicates [1], which is the bookkeeper client upgrade to 4.16.0 commit, introduced the problem. I need to sign off soon. I appreciate any help you can provide in looking into this issue! Thanks, Michael [0] https://github.com/apache/pulsar/commit/25af808970fc02281a0b9345c5de205654d7b789 [1] https://github.com/apache/pulsar/commit/a7a605f13565edc380b8ae01808832c426f16d1f On Thu, Apr 13, 2023 at 4:51 PM Michael Marshall <mmarsh...@apache.org> wrote: > > Here's an update on my progress. I reproduced Dave's observations with > the following steps: > > 1. Download and unpack 3.0.0 RC 1. > > 2. Add these two settings to the conf/standalone.conf: > exposingBrokerEntryMetadataToClientEnabled=true > brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor,org.apache.pulsar.common.intercept.AppendBrokerTimestampMetadataInterceptor > > 3. Run bin/pulsar standalone > > 4. Run curl http://localhost:8080/admin/v2/brokers/health > > 5. View the following log: > > 2023-04-13T16:48:39,256-0500 > [broker-topic-workers-OrderedExecutor-7-0] ERROR > org.apache.pulsar.common.protocol.Commands - > [PersistentSubscription{topic=persistent://pulsar/standalone/localhost:8080/healthcheck, > name=healthCheck-012d2e05-371f-4ddc-a666-daf9fa243d87}] [-1] Failed to > parse message metadata > java.lang.IndexOutOfBoundsException: readerIndex(100) + length(2) > exceeds writerIndex(100): UnpooledDuplicatedByteBuf(ridx: 100, widx: > 100, cap: 100, unwrapped: CompositeByteBuf(ridx: 100, widx: 100, cap: > 100, components=2)) > at > io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) > ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final] > at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749) > ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final] > at > org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692) > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0] > at > org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452) > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0] > at > org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445) > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0] > at > org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899) > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0] > at > org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918) > ~[org.apache.pulsar-pulsar-common-3.0.0.jar:3.0.0] > at > org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142) > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0] > at > org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100) > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0] > at > org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210) > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0] > at > org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151) > ~[org.apache.pulsar-pulsar-broker-3.0.0.jar:3.0.0] > at > org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137) > ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0] > at > org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113) > ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0] > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final] > at java.lang.Thread.run(Thread.java:833) ~[?:?] > > Now that we have a consistent way to reproduce it, I'll work on > understanding the root cause. I believe we run with these two settings > with other versions, so this feels like a regression. I cannot say for > sure yet. > > Thanks, > Michael > > On Thu, Apr 13, 2023 at 1:47 AM Enrico Olivelli <eolive...@gmail.com> wrote: > > > > Dave, > > IIUC (from offline discussion) those tests started to fail only recently, > > as we run those tests against the master branch of apache/pulsar repo. > > > > Does the beginning of failures match the upgrade to BK 4.16 ? > > The problem may be related to the handling of ByteBufs, IIRC there are a > > few changes in the latest > > BK release. > > > > Enrico > > > > Il giorno gio 13 apr 2023 alle ore 05:02 Hang Chen <chenh...@apache.org> ha > > scritto: > > > > > Hi Dave, > > > Thanks for verifying this release candidate. I have setup the Pulsar > > > cluster with 3.0 on my laptop and test the produce and consume, it works > > > fine. We also deploy the it on k8s, the produce, consume and health check > > > also works fine. Would you please share your broker configurations? > > > > > > Thanks, > > > Hang > > > > > > Dave Fisher <wave4d...@comcast.net> 于2023年4月13日周四 06:49写道: > > > > > >> Hi - > > >> > > >> We do End to end testing of Pulsar versions on GCP and with Pulsar 3.0 > > >> today we have the following problem. > > >> > > >> This image shows the broker deployments not coming up. This is a non-TLS > > >> setup. > > >> > > >> [image: Screen Shot 2023-04-12 at 1.09.59 PM.png] > > >> > > >> The Broker Pods have errors like this one: > > >> > > >> 2023-04-12T20:10:35,995+0000 [broker-topic-workers-OrderedExecutor-3-0] > > >> ERROR org.apache.pulsar.common.protocol.Commands - > > >> [PersistentSubscription{topic=persistent://pulsar/pulsar/10.236.0.4:8080/healthcheck, > > >> name=healthCheck-04a8e22e-3e0b-4aa3-9141-d8970c1b0712}] [-1] Failed to > > >> parse message metadata > > >> java.lang.IndexOutOfBoundsException: readerIndex(96) + length(2) exceeds > > >> writerIndex(96): UnpooledDuplicatedByteBuf(ridx: 96, widx: 96, cap: 96, > > >> unwrapped: CompositeByteBuf(ridx: 96, widx: 96, cap: 96, components=2)) > > >> at > > >> io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1442) > > >> ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final] > > >> at > > >> io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:749) > > >> ~[io.netty-netty-buffer-4.1.89.Final.jar:4.1.89.Final] > > >> at > > >> org.apache.pulsar.common.protocol.Commands.skipBrokerEntryMetadataIfExist(Commands.java:1692) > > >> ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > > >> at > > >> org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:452) > > >> ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > > >> at > > >> org.apache.pulsar.common.protocol.Commands.parseMessageMetadata(Commands.java:445) > > >> ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > > >> at > > >> org.apache.pulsar.common.protocol.Commands.peekMessageMetadata(Commands.java:1899) > > >> ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > > >> at > > >> org.apache.pulsar.common.protocol.Commands.peekAndCopyMessageMetadata(Commands.java:1918) > > >> ~[org.apache.pulsar-pulsar-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > > >> at > > >> org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:142) > > >> ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > > >> at > > >> org.apache.pulsar.broker.service.AbstractBaseDispatcher.filterEntriesForConsumer(AbstractBaseDispatcher.java:100) > > >> ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > > >> at > > >> org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.internalReadEntriesComplete(PersistentDispatcherSingleActiveConsumer.java:210) > > >> ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > > >> at > > >> org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.lambda$readEntriesComplete$1(PersistentDispatcherSingleActiveConsumer.java:151) > > >> ~[org.apache.pulsar-pulsar-broker-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > > >> at > > >> org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137) > > >> ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0] > > >> at > > >> org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:113) > > >> ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0] > > >> at > > >> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > > >> ~[io.netty-netty-common-4.1.89.Final.jar:4.1.89.Final] > > >> at java.lang.Thread.run(Thread.java:833) ~[?:?] > > >> > > >> We looked into this but aren’t sure what would cause this to suddenly be > > >> an issue. The candidate would be differences in the new version of BK … > > >> > > >> Best, > > >> Dave > > >> > > >