[jira] [Updated] (SPARK-51111) Streaming job gets stuck during the startup of Driver for the consumers are in the rebalance

Mars (Jira) Thu, 06 Feb 2025 05:40:33 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-51111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mars updated SPARK-51111:
-------------------------
    Description: 
When multiple Kafka `InputDStream` are used (number exceeding the  Driver's 
physical machine `Runtime.getRuntime.availableProcessors`), and the consumers 
are configured with the same group id. Driver will remains stuck and log 
'Request joining group due to: group is already rebalancing'.

It is more likely to occur in the k8s environment, because the cores of the 
Driver is exactly the same as the cores of the physical machine 
(`Runtime.getRuntime.availableProcessors`) .



ROOT CAUSE：
In the code,  Scala {{ParVector}} is used to initialize the streams, and this 
variable creates a thread pool with a number of threads equal to 
{{{}Runtime.getRuntime.availableProcessors{}}}. When the number of streams 
exceeds the number of threads in the pool, it needs to wait for the previous 
stream to complete before initializing the next one, leading to sequential 
waiting.

If the consumers of different topic InputStreams are configured with the same 
group id, the Kafka server will trigger a rebalance operation due to the new 
consumers joining the group. This rebalance operation can be time-consuming, 
and when there are many streams, it can cause the initialization process to 
hang very long time.

We'd better to create a thread pool with a specified number of threads for 
initialization. This can help the Driver complete the process faster.

  was:
When multiple Kafka `InputDStream` are used (number exceeding the  Driver's 
physical machine `Runtime.getRuntime.availableProcessors`), and the consumers 
are configured with the same group id. Driver will remains stuck and log 
'Request joining group due to: group is already rebalancing'.
It is more likely to occur in the k8s environment, because the cores of the 
Driver is exactly the same as the cores of the physical machine 
(`Runtime.getRuntime.availableProcessors`) .

The root cause：
In the code,  Scala {{ParVector}} is used to initialize the streams, and this 
variable creates a thread pool with a number of threads equal to 
{{{}Runtime.getRuntime.availableProcessors{}}}. When the number of streams 
exceeds the number of threads in the pool, it needs to wait for the previous 
stream to complete before initializing the next one, leading to sequential 
waiting.

If the consumers of different topic InputStreams are configured with the same 
group id, the Kafka server will trigger a rebalance operation due to the new 
consumers joining the group. This rebalance operation can be time-consuming, 
and when there are many streams, it can cause the initialization process to 
hang very long time.

We'd better to create a thread pool with a specified number of threads for 
initialization. This can help the Driver complete the process faster.


> Streaming job gets stuck during the startup of Driver for the consumers are 
> in the rebalance
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-51111
>                 URL: https://issues.apache.org/jira/browse/SPARK-51111
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.0.0
>            Reporter: Mars
>            Priority: Major
>
> When multiple Kafka `InputDStream` are used (number exceeding the  Driver's 
> physical machine `Runtime.getRuntime.availableProcessors`), and the consumers 
> are configured with the same group id. Driver will remains stuck and log 
> 'Request joining group due to: group is already rebalancing'.
> It is more likely to occur in the k8s environment, because the cores of the 
> Driver is exactly the same as the cores of the physical machine 
> (`Runtime.getRuntime.availableProcessors`) .
> ROOT CAUSE：
> In the code,  Scala {{ParVector}} is used to initialize the streams, and this 
> variable creates a thread pool with a number of threads equal to 
> {{{}Runtime.getRuntime.availableProcessors{}}}. When the number of streams 
> exceeds the number of threads in the pool, it needs to wait for the previous 
> stream to complete before initializing the next one, leading to sequential 
> waiting.
> If the consumers of different topic InputStreams are configured with the same 
> group id, the Kafka server will trigger a rebalance operation due to the new 
> consumers joining the group. This rebalance operation can be time-consuming, 
> and when there are many streams, it can cause the initialization process to 
> hang very long time.
> We'd better to create a thread pool with a specified number of threads for 
> initialization. This can help the Driver complete the process faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-51111) Streaming job gets stuck during the startup of Driver for the consumers are in the rebalance

Reply via email to