[PSA] GitHub Actions for releasing Apache Spark

2025-05-28 Thread Hyukjin Kwon
Hi all, I would like to share that GitHub Actions workflow to release Apache Spark is now available. The workflow is in https://github.com/apache/spark/blob/master/.github/workflows/release.yml. Apache Spark 3.5.6 was the first release done by this GitHub Actions. I plan to make some more changes

[ANNOUNCE] Apache Spark 3.5.6 released

2025-05-28 Thread Hyukjin Kwon
Hi all, We are happy to announce the availability of *Apache Spark 3.5.6*! To download Spark 3.5.6, head over to the download page: https://spark.apache.org/downloads.html Spark 3.5.6 is also available on PyPI(pyspark ), Docker Hub

Unsubscribe

2025-05-28 Thread Jeremy Chung
Unsubscribe

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-28 Thread Yuanjian Li
+1 Kent Yao 于2025年5月28日周三 19:31写道: > +1, LGTM. > > Kent > > 在 2025年5月29日星期四,Chao Sun 写道: > >> +1. Super excited by this initiative! >> >> On Wed, May 28, 2025 at 1:54 PM Yanbo Liang wrote: >> >>> +1 >>> >>> On Wed, May 28, 2025 at 12:34 PM huaxin gao >>> wrote: >>> +1 By unifying ba

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-28 Thread Kent Yao
+1, LGTM. Kent 在 2025年5月29日星期四,Chao Sun 写道: > +1. Super excited by this initiative! > > On Wed, May 28, 2025 at 1:54 PM Yanbo Liang wrote: > >> +1 >> >> On Wed, May 28, 2025 at 12:34 PM huaxin gao >> wrote: >> >>> +1 >>> By unifying batch and low-latency streaming in Spark, we can eliminate >

Re: [VOTE] Release Spark 3.5.6 (RC1)

2025-05-28 Thread Dongjoon Hyun
>From Vlad's claims, the following guess is incorrect because the link is an >upgrade from Apache ORC 1.9.5 to 1.9.6 which is a maintenance version upgrade. > I guess that the similar argument applies to ORC upgrade > (https://github.com/apache/spark/pull/50813). > [SPARK-52025][BUILD][3.5] Upgra

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-28 Thread Chao Sun
+1. Super excited by this initiative! On Wed, May 28, 2025 at 1:54 PM Yanbo Liang wrote: > +1 > > On Wed, May 28, 2025 at 12:34 PM huaxin gao > wrote: > >> +1 >> By unifying batch and low-latency streaming in Spark, we can eliminate >> the need for separate streaming engines, reducing system co

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-28 Thread Yanbo Liang
+1 On Wed, May 28, 2025 at 12:34 PM huaxin gao wrote: > +1 > By unifying batch and low-latency streaming in Spark, we can eliminate the > need for separate streaming engines, reducing system complexity and > operational cost. Excited to see this direction! > > On Wed, May 28, 2025 at 9:08 AM Mic

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-28 Thread huaxin gao
+1 By unifying batch and low-latency streaming in Spark, we can eliminate the need for separate streaming engines, reducing system complexity and operational cost. Excited to see this direction! On Wed, May 28, 2025 at 9:08 AM Mich Talebzadeh wrote: > Hi, > > My point about "in real time applica

Unsubscribe

2025-05-28 Thread Gunturu Manohar
Unsubscribe

[ANNOUNCE] Apache Spark 4.0.0 released

2025-05-28 Thread Wenchen Fan
Hi All, We are happy to announce the availability of *Apache Spark 4.0.0*! Apache Spark 4.0.0 is the first release of the 4.x line. This release resolves more than 5100 tickets with contributions from more than 390 individuals. To download Spark 4.0.0, head over to the download page: https://spa

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-28 Thread Mich Talebzadeh
Hi, My point about "in real time application or data, there is nothing as an answer which is supposed to be late and correct. The timeliness is part of the application. if I get the right answer too slowly it becomes useless or wrong" is actually fundamental to *why* we need this Spark Structured

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-28 Thread Denny Lee
Hey Mich, Sorry, I may be missing something here but what does your definition here have to do with the SPIP? Perhaps add comments directly to the SPIP to provide context as the code snippet below is a direct copy from the SPIP itself. Thanks, Denny On Wed, May 28, 2025 at 06:48 Mich Talebz

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-28 Thread Mich Talebzadeh
just to add A stronger definition of real time. The engineering definition of real time is roughly fast enough to be interactive However, I put a stronger definition. In real time application or data, there is nothing as an answer which is supposed to be late and correct. The timeliness is part o

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-28 Thread Mich Talebzadeh
The current limitations in SSS come from micro-batching.If you are going to reduce micro-batching, this reduction must be balanced against the available processing capacity of the cluster to prevent back pressure and instability. In the case of Continuous Processing mode, a specific continuous trig

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-05-28 Thread Jungtaek Lim
Thanks for initiating this. I wonder if we don't have any compatibility issue on every component - SS area does not have an issue, but I don't quite remember if the history server would be OK with this. What is the story of the migration if they had been using leveldb? I guess it could be probably