I found something https://issues.apache.org/jira/browse/SPARK-45101
I tried in a lower environment but found no issues. Although there is a
version mismatch running spark 3.2.3 in lower env and *3.2.0 in production.*
Thanks & Regards,
Nayan Sharma
 *+91-8095382952*

<https://www.linkedin.com/in/nayan-sharma>
<http://stackoverflow.com/users/3687426/nayan-sharma?tab=profile>


On Tue, Apr 8, 2025 at 5:13 PM Ángel Álvarez Pascua <
angel.alvarez.pas...@gmail.com> wrote:

> Hi Nayan,
>
> Your issue is quite interesting indeed. I'm not a super expert on Spark
> Structured Streaming, but I faced a similar issue last year and I still
> have pending to look into it on Databricks and write an article about. Do
> you end up by any chance having native OOMs? Have you analyzed/played with
> backpressure configuration?
>
> As you said, the threaddumps don't give much info, but maybe a heapdump
> could be much more insightful.
>
> Have you tried to reproduced this issue in a more simple isolated testing
> environment? I'd would be super glad to help ... I'm passionate+stubborn
> about this kind of mysteries ...
>
> Regards,
> Ángel.
>
> El lun, 7 abr 2025 a las 11:26, nayan sharma (<nayansharm...@gmail.com>)
> escribió:
>
>> I took thread dump from 3 nodes. I couldn't figure out anything out of
>> it. It took almost 25hr after which the job started showing active task
>> stuck behaviour.
>>
>> We have set triggers for 1 minute and 1 batch which is successful has 32
>> tasks in it. But the batch  for which is stuck task are stuck has total 34
>> task and 32 in completed jobs and 2 in active
>>
>> Thanks & Regards,
>> Nayan Sharma
>>  *+91-8095382952*
>>
>> <https://www.linkedin.com/in/nayan-sharma>
>> <http://stackoverflow.com/users/3687426/nayan-sharma?tab=profile>
>>
>>
>> On Sun, Apr 6, 2025 at 2:21 PM Shay Elbaz <sel...@paypal.com> wrote:
>>
>>> Hi Nayan,
>>>
>>>
>>>
>>> Did you try to take a thread dump while there are accumulated jobs?
>>>
>>>
>>>
>>> Shay
>>>
>>>
>>>
>>> *From: *nayan sharma <nayansharm...@gmail.com>
>>> *Date: *Sunday, 6 April 2025 at 6:23
>>> *To: *user.spark <user@spark.apache.org>
>>> *Subject: *High count of Active Jobs
>>>
>>> This message contains hyperlinks, take precaution before opening these
>>> links.
>>>
>>> Hi,
>>>
>>> I am facing issues with the high number of active jobs showing in UI. I
>>> am using spark structure streaming to read data from Solace and
>>> writing back to Kafka.
>>>
>>> After 24-28Hr, I see active jobs start accumulating and it happens on
>>> random intervals after 24-28hr. And also I can see many active tasks more
>>> than 107817 and the job is up for 58 hr.
>>>
>>>
>>>
>>> Spark Config
>>>
>>> executor 15
>>>
>>> core 5
>>>
>>> memory 10g
>>>
>>> driver 15g
>>>
>>>
>>>
>>> Job main flow is working absolutely fine. I can see data getting
>>> populated in Kafka. On HDFS, every minute batch data is stored. No issues
>>> at main flow.
>>>
>>>
>>>
>>> I did a lot of research but no luck. some say GC is not working or spark
>>> task scheduler is not in sync. Can anybody faced such issue in past or can
>>> guide me where to look,
>>>
>>>
>>>
>>>
>>>
>>> Thanks & Regards,
>>>
>>> Nayan Sharma
>>>
>>> *[image: Image removed by sender.]* *+91-8095382952*
>>>
>>>
>>>
>>> [image: Image removed by sender.]
>>> <https://www.linkedin.com/in/nayan-sharma>[image: Image removed by
>>> sender.]
>>> <http://stackoverflow.com/users/3687426/nayan-sharma?tab=profile>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to