2020-04-16 15:45:50 UTC - matt_innerspace.io: any chance we can get another 
docker image for `2.5.1 rc4`?  Or instructions on how to do it?  cc: @Addison 
Higham?
----
2020-04-16 15:50:22 UTC - Addison Higham: building the docker images is 
*fairly* straightforward `mvn package -Pdocker -DskipTests`  but it works best 
in a new clone of the repo, their tends to be a few droppings left behind that 
aren't properly cleaned up (the c++/python client build leaves around some 
files on disk that need to be cleaned up)
----
2020-04-16 15:50:44 UTC - Addison Higham: that produces images with the 
`apachepulsar/pulsar-all` name, I just re-tag them and push
----
2020-04-16 15:50:51 UTC - Addison Higham: but, I am kicking off a build right 
now :slightly_smiling_face:
----
2020-04-16 15:51:00 UTC - Addison Higham: will push and let you know when it is 
done
+1 : matt_innerspace.io
----
2020-04-16 15:51:49 UTC - matt_innerspace.io: do you do the build in the 
`apachepulsar/pulsar-build` container?
----
2020-04-16 16:22:12 UTC - Addison Higham: I don't run anything in there 
directly, the mvn process does use that container though, but I don't believe 
any java compilation happens there
----
2020-04-16 17:53:03 UTC - Sijie Guo: @matt_innerspace.io you can try this 
`docker pull streamnative/pulsar-all:v2.5.1-candidate-4`
+1 : matt_innerspace.io, Konstantinos Papalias
----
2020-04-16 18:07:22 UTC - Addison Higham: woohoo! I was having issues with a 
build on mine that I got distracted on (though I think I did just figure it out)
----
2020-04-16 22:47:39 UTC - Addison Higham: FYI, deployed rc4 to both beta and 
prod clusters, deployed to prod to see if it fixed the issue we are having with 
producers that we don't have a reliable repro for. So far so good.
----
2020-04-17 00:12:22 UTC - Addison Higham: well... the good news, we don't see 
any regressions, but the patches I submitted that I *hoped* would fix my 
producers not getting cleaned up (resulting in a ton of "producer already 
exists errors") didn't work (surprisingly...) but it did give some more 
concrete data, thread here
----
2020-04-17 00:13:06 UTC - Addison Higham: So, I added code to ensure that 
futures returned out of the broker service timeout, now I get this new error:
``` 23:24:11.077 [pulsar-io-24-3] ERROR 
org.apache.pulsar.broker.service.ServerCnx - [/10.9.127.254:55768] Failed to 
create topic 
<persistent://notification-service-prod/bridge/message-notification-center>
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: 
java.util.concurrent.CompletionException: 
java.util.concurrent.TimeoutException: Future didn't finish within deadline
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) 
~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
 ~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
org.apache.pulsar.broker.service.BrokerService.lambda$futureWithDeadline$18(BrokerService.java:740)
 ~[org.apache.pulsar-pulsar-broker-2.5.1.jar:2.5.1]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) 
[io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) 
[io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
 [io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
 [io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) 
[io.netty-netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
 [io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
[io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]: Caused by: 
java.util.concurrent.TimeoutException: Future didn't finish within deadline
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
org.apache.pulsar.broker.service.BrokerService.futureWithDeadline(BrokerService.java:747)
 ~[org.apache.pulsar-pulsar-broker-2.5.1.jar:2.5.1]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
org.apache.pulsar.broker.service.BrokerService.loadOrCreatePersistentTopic(BrokerService.java:839)
 ~[org.apache.pulsar-pulsar-broker-2.5.1.jar:2.5.1]
pulsar/pulsar-prod-broker-7899c598c8-ld7jx[broker]:     at 
org.apache.pulsar.broker.service.BrokerService.lambda$getTopic$13(BrokerService.java:678)
 ~[org.apache.pulsar-pulsar-broker-2.5.1.jar:2.5.1]```
----
2020-04-17 00:13:55 UTC - Addison Higham: this confirms that
```BrokerService.loadOrCreatePersistentTopic```
----
2020-04-17 00:14:01 UTC - Addison Higham: is just stalling out for unknown 
reasons
----
2020-04-17 00:15:14 UTC - Addison Higham: the extra mystery, in 
`ServerCnx.java:1091` which ultimately calls into 
`loadOrCreatePersistentTopic`, we have the following code that should ensure 
producers are cleaned up:
``` Throwable cause = exception.getCause();
                            if (!(cause instanceof 
ServiceUnitNotReadyException)) {
                                // Do not print stack traces for expected 
exceptions
                                log.error("[{}] Failed to create topic {}", 
remoteAddress, topicName, exception);
                            }

                            // If client timed out, the future would have been 
completed
                            // by subsequent close. Send error back to
                            // client, only if not completed already.
                            if 
(producerFuture.completeExceptionally(exception)) {
                                ctx.writeAndFlush(Commands.newError(requestId,
                                        
BrokerServiceException.getClientErrorCode(cause), cause.getMessage()));
                            }
                            producers.remove(producerId, producerFuture);

                            return null;```
----
2020-04-17 04:56:24 UTC - Addison Higham: okay, so I think that this change 
<https://github.com/apache/pulsar/pull/6489> has introduce a minor regression 
for loading topics with many replicated clusters. The timeout is simply too low 
as it takes a while to load topics that need to load replication clusters. I 
have raised the timeout here: <https://github.com/apache/pulsar/pull/6750> 
which I think should be sufficient and reasonable, but long term, it seems like 
the brokerServer could use some refactoring such that it is more steps which 
each individually have a timeout.

After some analysis, I do think this is helping in all other cases where see 
issues with producers (but need to observe a bit longer to be doubly sure)
----
2020-04-17 06:47:29 UTC - Sijie Guo: @tuteng @Penghui Li ^^
----

Reply via email to