Shuffle TTLs

2024-10-16 Thread Holden Karau
Hi Spark Devs, So back in Spark 1.X we had shuffle TTLs, but they did not take into account last access times. With the increased use of notebooks where dataframes & rdds are more likely to be defined at the global scope I was thinking it could be a good time to try and re-introduce shuffle TTLs b

Re: Shuffle TTLs

2024-10-16 Thread Reynold Xin
Thanks for bringing this up. Wouldn't it be better for the notebooks to control when these DFs/RDDs expire so they can do fine granular control? On Wed, Oct 16, 2024 at 7:51 AM Holden Karau wrote: > Hi Spark Devs, > > So back in Spark 1.X we had shuffle TTLs, but they did not take into > account

Apache Spark 3.4.4 EOL Release?

2024-10-16 Thread Dongjoon Hyun
Hi, All. Since the Apache Spark 3.4.0 RC7 vote passed on Apr 6, 2023, branch-3.4 has been maintained and served well until now. - https://github.com/apache/spark/releases/tag/v3.4.0 (tagged on Apr 6, 2023) - https://lists.apache.org/thread/0o61jn9cmg6r0f22ljgjg5c31z8fn0zn (vote result on April 13

Re: Apache Spark 3.4.4 EOL Release?

2024-10-16 Thread Holden Karau
+1 on a 3.4.4 EOL release On Wed, Oct 16, 2024 at 9:37 AM Dongjoon Hyun wrote: > Hi, All. > > Since the Apache Spark 3.4.0 RC7 vote passed on Apr 6, 2023, branch-3.4 > has been maintained and served well until now. > > - https://github.com/apache/spark/releases/tag/v3.4.0 (tagged on Apr 6, > 202

Re: Apache Spark 3.4.4 EOL Release?

2024-10-16 Thread Jungtaek Lim
There is another open correctness issue for the 3.4 version line - PR is up and approved by a non-committer, and I'm struggling to find a committer to review and approve. Issue: https://issues.apache.org/jira/browse/SPARK-49829 PR: https://github.com/apache/spark/pull/48297 I'd propose to include

Re: Shuffle TTLs

2024-10-16 Thread Holden Karau
It’s a good suggestion, however I don’t think there is a mechanism for TTLs in notebooks and most things in notebooks might not be safe to recompute, unlike if we delete shuffle files. Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/

Re: Apache Spark 3.4.4 EOL Release?

2024-10-16 Thread L. C. Hsieh
+1 Thanks Dongjoon. On Wed, Oct 16, 2024 at 11:41 AM Holden Karau wrote: > > +1 on a 3.4.4 EOL release > > On Wed, Oct 16, 2024 at 9:37 AM Dongjoon Hyun wrote: >> >> Hi, All. >> >> Since the Apache Spark 3.4.0 RC7 vote passed on Apr 6, 2023, branch-3.4 has >> been maintained and served well u

Re: Apache Spark 3.4.4 EOL Release?

2024-10-16 Thread huaxin gao
+1 On Wed, Oct 16, 2024 at 1:53 PM L. C. Hsieh wrote: > +1 > > Thanks Dongjoon. > > > On Wed, Oct 16, 2024 at 11:41 AM Holden Karau > wrote: > > > > +1 on a 3.4.4 EOL release > > > > On Wed, Oct 16, 2024 at 9:37 AM Dongjoon Hyun > wrote: > >> > >> Hi, All. > >> > >> Since the Apache Spark 3.4.

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-16 Thread Russell Jurney
For starters I created a ticket. I'm going to work on the project a bit and then name a date and time. https://github.com/graphframes/graphframes/issues/460 On Tue, Oct 15, 2024 at 7:48 PM Ángel wrote: > We could create a prioritized list of the most important bugs to fix first > and distribute

Re: [DISCUSS] Migrate or deprecate the Spark Kinesis connector

2024-10-16 Thread Jungtaek Lim
DStream is deprecated in Spark 3.4.0, hence Kinesis connector for DStream is inheriting the same fate. We just didn't make the whole class of DStream to produce warning messages, as we made the entry class to produce warning messages and thought it's sufficient. On Mon, Oct 14, 2024 at 5:03 PM Joh