[
https://issues.apache.org/jira/browse/KAFKA-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bruno Cadonna updated KAFKA-13887:
----------------------------------
Priority: Minor (was: Major)
> Running multiple instance of same stateful KafkaStreams application on single
> host raise Exception
> --------------------------------------------------------------------------------------------------
>
> Key: KAFKA-13887
> URL: https://issues.apache.org/jira/browse/KAFKA-13887
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 2.6.0
> Reporter: Sina Askarnejad
> Priority: Minor
>
> KAFKA-10716 locks the state store directory on the running host, as it stores
> the processId in a *kafka-streams-process-metadata* file in this path. As a
> result to run multiple instances of the same application on a single host
> each instance must run with different *state.dir* config, otherwise the
> following exception will be raised for the second instance:
>
> Exception in thread "main" org.apache.kafka.streams.errors.StreamsException:
> Unable to initialize state, this can happen if multiple instances of Kafka
> Streams are running in the same state directory
> at
> org.apache.kafka.streams.processor.internals.StateDirectory.initializeProcessId(StateDirectory.java:191)
> at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:868)
> at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:851)
> at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:821)
> at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:733)
>
> The easiest solution multi-threading. Running single instance with multiple
> threads, but the multi-threading programming is not suitable for all
> scenarios. e.g., when the tasks are CPU intensive, or in large scale
> scenarios, or fully utilizing multi core CPUS.
>
> The second solution is multi-processing. This solution on a single host needs
> extra work and advisor, as each instance needs to be run with different
> {*}state.dir{*}. It is a good enhancement if kafkaStreams could handle this
> config for multi instance.
>
> The proposed solution is that the KafkaStreams use the
> */\{state.dir}/\{application.id}/\{ordinal.number}* path instead of
> */\{state.dir}/\{application.id}* to store the meta file and states. The
> *ordinal.number* starts with 0 and is incremental.
> When an instance starts it checks the ordinal.number directories start by 0
> and finds the first subdirectory that is not locked and use that for its
> state directory, this way all the tasks assigns correctly on rebalance and
> multiple instance can be run on single host.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)