[
https://issues.apache.org/jira/browse/SPARK-51111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mars updated SPARK-51111:
-------------------------
Description:
When multiple Kafka `InputDStream` are used (number exceeding the Driver's
physical machine `Runtime.getRuntime.availableProcessors`), and the consumers
are configured with the same group id. Driver will remains stuck and log
'Request joining group due to: group is already rebalancing'.
It is more likely to occur in the k8s environment, because the cores of the
Driver is exactly the same as the cores of the physical machine
(`Runtime.getRuntime.availableProcessors`) .
ROOT CAUSE:
In the code, Scala {{ParVector}} is used to initialize the streams, and this
variable creates a thread pool with a number of threads equal to
{{{}Runtime.getRuntime.availableProcessors{}}}. When the number of streams
exceeds the number of threads in the pool, it needs to wait for the previous
stream to complete before initializing the next one, leading to sequential
waiting.
If the consumers of different topic InputStreams are configured with the same
group id, the Kafka server will trigger a rebalance operation due to the new
consumers joining the group. This rebalance operation can be time-consuming,
and when there are many streams, it can cause the initialization process to
hang very long time.
We'd better to create a thread pool with a specified number of threads for
initialization. This can help the Driver complete the process faster.
was:
When multiple Kafka `InputDStream` are used (number exceeding the Driver's
physical machine `Runtime.getRuntime.availableProcessors`), and the consumers
are configured with the same group id. Driver will remains stuck and log
'Request joining group due to: group is already rebalancing'.
It is more likely to occur in the k8s environment, because the cores of the
Driver is exactly the same as the cores of the physical machine
(`Runtime.getRuntime.availableProcessors`) .
The root cause:
In the code, Scala {{ParVector}} is used to initialize the streams, and this
variable creates a thread pool with a number of threads equal to
{{{}Runtime.getRuntime.availableProcessors{}}}. When the number of streams
exceeds the number of threads in the pool, it needs to wait for the previous
stream to complete before initializing the next one, leading to sequential
waiting.
If the consumers of different topic InputStreams are configured with the same
group id, the Kafka server will trigger a rebalance operation due to the new
consumers joining the group. This rebalance operation can be time-consuming,
and when there are many streams, it can cause the initialization process to
hang very long time.
We'd better to create a thread pool with a specified number of threads for
initialization. This can help the Driver complete the process faster.
> Streaming job gets stuck during the startup of Driver for the consumers are
> in the rebalance
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-51111
> URL: https://issues.apache.org/jira/browse/SPARK-51111
> Project: Spark
> Issue Type: Bug
> Components: DStreams
> Affects Versions: 2.0.0
> Reporter: Mars
> Priority: Major
>
> When multiple Kafka `InputDStream` are used (number exceeding the Driver's
> physical machine `Runtime.getRuntime.availableProcessors`), and the consumers
> are configured with the same group id. Driver will remains stuck and log
> 'Request joining group due to: group is already rebalancing'.
> It is more likely to occur in the k8s environment, because the cores of the
> Driver is exactly the same as the cores of the physical machine
> (`Runtime.getRuntime.availableProcessors`) .
> ROOT CAUSE:
> In the code, Scala {{ParVector}} is used to initialize the streams, and this
> variable creates a thread pool with a number of threads equal to
> {{{}Runtime.getRuntime.availableProcessors{}}}. When the number of streams
> exceeds the number of threads in the pool, it needs to wait for the previous
> stream to complete before initializing the next one, leading to sequential
> waiting.
> If the consumers of different topic InputStreams are configured with the same
> group id, the Kafka server will trigger a rebalance operation due to the new
> consumers joining the group. This rebalance operation can be time-consuming,
> and when there are many streams, it can cause the initialization process to
> hang very long time.
> We'd better to create a thread pool with a specified number of threads for
> initialization. This can help the Driver complete the process faster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]