2020-04-16 15:45:50 UTC - matt_innerspace.io: any chance we can get another
docker image for `2.5.1 rc4`? Or instructions on how to do it? cc: @Addison
Higham?
----
2020-04-16 15:50:22 UTC - Addison Higham: building the docker images is
*fairly* straightforward `mvn package -Pdocker -DskipTests` but it works best
in a new clone of the repo, their tends to be a few droppings left behind that
aren't properly cleaned up (the c++/python client build leaves around some
files on disk that need to be cleaned up)
----
2020-04-16 15:50:44 UTC - Addison Higham: that produces images with the
`apachepulsar/pulsar-all` name, I just re-tag them and push
----
2020-04-16 15:50:51 UTC - Addison Higham: but, I am kicking off a build right
now :slightly_smiling_face:
----
2020-04-16 15:51:00 UTC - Addison Higham: will push and let you know when it is
done
+1 : matt_innerspace.io
----
2020-04-16 15:51:49 UTC - matt_innerspace.io: do you do the build in the
`apachepulsar/pulsar-build` container?
----
2020-04-16 16:22:12 UTC - Addison Higham: I don't run anything in there
directly, the mvn process does use that container though, but I don't believe
any java compilation happens there
----
2020-04-16 17:53:03 UTC - Sijie Guo: @matt_innerspace.io you can try this
`docker pull streamnative/pulsar-all:v2.5.1-candidate-4`
+1 : matt_innerspace.io, Konstantinos Papalias
----
2020-04-16 18:07:22 UTC - Addison Higham: woohoo! I was having issues with a
build on mine that I got distracted on (though I think I did just figure it out)
----
2020-04-16 22:47:39 UTC - Addison Higham: FYI, deployed rc4 to both beta and
prod clusters, deployed to prod to see if it fixed the issue we are having with
producers that we don't have a reliable repro for. So far so good.
----
2020-04-17 00:12:22 UTC - Addison Higham: well... the good news, we don't see
any regressions, but the patches I submitted that I *hoped* would fix my
producers not getting cleaned up (resulting in a ton of "producer already
exists errors") didn't work (surprisingly...) but it did give some more
concrete data, thread here
----
2020-04-17 00:13:06 UTC - Addison Higham: So, I added code to ensure that
futures returned out of the broker service timeout, now I get this new error:
``` 23:24:11.077 [pulsar-io-24-3] ERROR
org.apache.pulsar.broker.service.ServerCnx - [/10.9.127.254:55768] Failed to
create topic
<persistent://notification-service-prod/bridge/message-notification-center>
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:
java.util.concurrent.CompletionException:
java.util.concurrent.TimeoutException: Future didn't finish within deadline
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
org.apache.pulsar.broker.service.BrokerService.lambda$futureWithDeadline$18(BrokerService.java:740)
~[org.apache.pulsar-pulsar-broker-2.5.1.jar:2.5.1]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
[io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
[io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
[io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
[io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
[io.netty-netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
[io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
[io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: Caused by:
java.util.concurrent.TimeoutException: Future didn't finish within deadline
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
org.apache.pulsar.broker.service.BrokerService.futureWithDeadline(BrokerService.java:747)
~[org.apache.pulsar-pulsar-broker-2.5.1.jar:2.5.1]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
org.apache.pulsar.broker.service.BrokerService.loadOrCreatePersistentTopic(BrokerService.java:839)
~[org.apache.pulsar-pulsar-broker-2.5.1.jar:2.5.1]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: at
org.apache.pulsar.broker.service.BrokerService.lambda$getTopic$13(BrokerService.java:678)
~[org.apache.pulsar-pulsar-broker-2.5.1.jar:2.5.1]```
----
2020-04-17 00:13:55 UTC - Addison Higham: this confirms that
```BrokerService.loadOrCreatePersistentTopic```
----
2020-04-17 00:14:01 UTC - Addison Higham: is just stalling out for unknown
reasons
----
2020-04-17 00:15:14 UTC - Addison Higham: the extra mystery, in
`ServerCnx.java:1091` which ultimately calls into
`loadOrCreatePersistentTopic`, we have the following code that should ensure
producers are cleaned up:
``` Throwable cause = exception.getCause();
if (!(cause instanceof
ServiceUnitNotReadyException)) {
// Do not print stack traces for expected
exceptions
log.error("[{}] Failed to create topic {}",
remoteAddress, topicName, exception);
}
// If client timed out, the future would have been
completed
// by subsequent close. Send error back to
// client, only if not completed already.
if
(producerFuture.completeExceptionally(exception)) {
ctx.writeAndFlush(Commands.newError(requestId,
BrokerServiceException.getClientErrorCode(cause), cause.getMessage()));
}
producers.remove(producerId, producerFuture);
return null;```
----
2020-04-17 04:56:24 UTC - Addison Higham: okay, so I think that this change
<https://github.com/apache/pulsar/pull/6489> has introduce a minor regression
for loading topics with many replicated clusters. The timeout is simply too low
as it takes a while to load topics that need to load replication clusters. I
have raised the timeout here: <https://github.com/apache/pulsar/pull/6750>
which I think should be sufficient and reasonable, but long term, it seems like
the brokerServer could use some refactoring such that it is more steps which
each individually have a timeout.
After some analysis, I do think this is helping in all other cases where see
issues with producers (but need to observe a bit longer to be doubly sure)
----
2020-04-17 06:47:29 UTC - Sijie Guo: @tuteng @Penghui Li ^^
----