[
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688945#comment-17688945
]
Tamas commented on KAFKA-14713:
-------------------------------
Hi [~mjsax] looks similar, but not exactly. They see this issue with
exactly_once_beta processing guarantee, while we have it with the default
at_least_once. For us the important part is that is issue is fixed as soon as
possible, because right now I am between a rock and a hard place because of it.
> Kafka Streams global table startup takes too long
> -------------------------------------------------
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Reporter: Tamas
> Priority: Critical
>
> *Some context first*
> We have a spring based kafka streams application. This application is
> listening to two topics. Let's call them apartment and visitor. The
> apartments are stored in a global table, while the visitors are in the stream
> we are processing, and at one point we are joining the visitor stream
> together with the apartment table. In our test environment, both topics
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to
> start up, because the global table takes 30 seconds (default value for the
> global request timeout) to accept that there are no messages in the apartment
> topics, for each and every partition. If we send out the list of apartments
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup
> time would be 48 minutes. Not having messages in the topics between
> application shutdown and restart is a valid use case, so this is quite a big
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for
> the global table initialization, but a global request timeout for a lot of
> things, we do not know what else it will affect, so we are not very keen on
> doing that. Even then, it would mean a 1.5 minute delay for this particular
> client (more if we will have other use cases in the future where we will need
> to use more global tables), which is far too much, considering that the
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
> # Introduce a specific global table initialization timeout in
> GlobalStateManagerImpl. Then we would be able to safely modify that value
> without fear of making some other part of kafka unstable.
> # Parallelize the initialization of the global table partitions in
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead
> of linear with the number of partitions would be a huge help.
> # As long as we receive a response, accept the empty map in the
> KafkaConsumer, and continue instead of going into a busy-waiting loop.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)