[
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690560#comment-17690560
]
Matthias J. Sax commented on KAFKA-14713:
-----------------------------------------
Ah. Thanks. That makes sense. Did not look into the consumer code, only
streams. So it's fixed via https://issues.apache.org/jira/browse/KAFKA-12980 in
3.2.0 – updated the ticket accordingly. Thanks for getting back. It bugged my
that I did not understand it :)
> Kafka Streams global table startup takes too long
> -------------------------------------------------
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 3.0.2
> Reporter: Tamas
> Priority: Critical
> Fix For: 3.2.0
>
>
> *Some context first*
> We have a spring based kafka streams application. This application is
> listening to two topics. Let's call them apartment and visitor. The
> apartments are stored in a global table, while the visitors are in the stream
> we are processing, and at one point we are joining the visitor stream
> together with the apartment table. In our test environment, both topics
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to
> start up, because the global table takes 30 seconds (default value for the
> global request timeout) to accept that there are no messages in the apartment
> topics, for each and every partition. If we send out the list of apartments
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup
> time would be 48 minutes. Not having messages in the topics between
> application shutdown and restart is a valid use case, so this is quite a big
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for
> the global table initialization, but a global request timeout for a lot of
> things, we do not know what else it will affect, so we are not very keen on
> doing that. Even then, it would mean a 1.5 minute delay for this particular
> client (more if we will have other use cases in the future where we will need
> to use more global tables), which is far too much, considering that the
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
> # Introduce a specific global table initialization timeout in
> GlobalStateManagerImpl. Then we would be able to safely modify that value
> without fear of making some other part of kafka unstable.
> # Parallelize the initialization of the global table partitions in
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead
> of linear with the number of partitions would be a huge help.
> # As long as we receive a response, accept the empty map in the
> KafkaConsumer, and continue instead of going into a busy-waiting loop.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)