Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-30 Thread Jungtaek Lim
I agree and I appreciate your input to clarify the term and the gap we have from the theoretical definition. I just would like to put some color here for just 2 cents. It is not uncommon for the technical term to be re-interpreted and expanded. One of the known examples is "exactly-once processin

Re: [DISCUSS][MINOR] Fix broken link in spark-website for SS Programming Guide

2025-05-30 Thread Wenchen Fan
+1 to fix this issue immediately. On Fri, May 30, 2025 at 3:16 PM Jerry Peng wrote: > +1 for fixing this immediately. > > Anish, thanks for pointing this issue out! > > On Fri, May 30, 2025 at 12:12 AM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> I’m +1 to fix this in website for 4

Re: [DISCUSS][MINOR] Fix broken link in spark-website for SS Programming Guide

2025-05-30 Thread Jerry Peng
+1 for fixing this immediately. Anish, thanks for pointing this issue out! On Fri, May 30, 2025 at 12:12 AM Jungtaek Lim wrote: > I’m +1 to fix this in website for 4.0.0 immediately. > > I got some inputs about this and they were unable to figure out the > correct page url. I’m mostly sure it w

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-30 Thread Jerry Peng
Mich, Sounds good. I will add the clarification to the SPIP. On Fri, May 30, 2025 at 3:47 AM Mich Talebzadeh wrote: > Hi Jerry, > > In essence, these definitions (hard or soft) help clarify that "real-time" > is* not a single, monolithic concept here,* but rather a spectrum defined > by the cr

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-30 Thread Denny Lee
+1 (non-binding) On Fri, May 30, 2025 at 9:17 AM xianjin wrote: > +1 > Sent from my iPhone > > On May 29, 2025, at 12:53 PM, Yuanjian Li wrote: > >  > +1 > > Kent Yao 于2025年5月28日周三 19:31写道: > >> +1, LGTM. >> >> Kent >> >> 在 2025年5月29日星期四,Chao Sun 写道: >> >>> +1. Super excited by this initiati

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-30 Thread xianjin
+1Sent from my iPhoneOn May 29, 2025, at 12:53 PM, Yuanjian Li wrote:+1Kent Yao 于2025年5月28日周三 19:31写道:+1, LGTM.Kent在 2025年5月29日星期四,Chao Sun 写道:+1. Super excited by this initiative!On Wed, May 28, 2025 at 1:54 PM Yanbo Liang wrote:+1On We

Re: [VOTE] Release Spark 3.5.6 (RC1)

2025-05-30 Thread Rozov, Vlad
Hi Dongjoon, I think that priority should be given to user and dev Apache Spark communities and decision made based on what mostly benefits both communities. Said that I am OK with all 3 possible scenarios and will go with the community decision and Spark policies. This is the list in the orde

Re: [DISCUSS][MINOR] Fix broken link in spark-website for SS Programming Guide

2025-05-30 Thread Jungtaek Lim
I’m +1 to fix this in website for 4.0.0 immediately. I got some inputs about this and they were unable to figure out the correct page url. I’m mostly sure it will happen to many users as well. We could also fix this in the next maintenance release, but since we just released Apache Spark 4.0.0, i

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-30 Thread Mich Talebzadeh
Hi Jerry, In essence, these definitions (hard or soft) help clarify that "real-time" is* not a single, monolithic concept here,* but rather a spectrum defined by the criticality of timeliness and systems under consideration. Common data processing solutions branded as "real-time" are typically ope

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-30 Thread Mark Hamstra
A soft real-time system still defines an interval or frame within which results should be available, and often provides explicit warning or error-handling mechanisms when frame rates are missed. I see nothing like that in the SPIP. Instead, the length of the underlying microbatches is specified in

Re: [VOTE] Release Spark 3.5.6 (RC1)

2025-05-30 Thread Bjørn Jørgensen
" *- distributing libraries with CVE is not a good development practice*" This version of spark is only a minor upgrade of a maintained branch and we have a newer release - 4.0 now for users that need that. For some time ago I updated FasterXML jackson to fix one CVE https://github.com/apache/spar

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-30 Thread Mich Talebzadeh
ok fair points This SPIP (Structured Streaming, in this context) admittedly does not meet the rigorous, academic definition of a soft real-time system, due to the lack of explicit, guaranteed deadlines and internal mechanisms for handling missed frames. Having said that, despite not being a "stri

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-30 Thread L. C. Hsieh
Thanks to everyone in the community for your interest and support for this proposal. We've had extensive and constructive discussions both in this thread and in the SPIP document. These conversations have been positive and encouraging for moving in this direction. Special thanks to the SPIP authors

Re: [VOTE] Release Spark 3.5.6 (RC1)

2025-05-30 Thread Hyukjin Kwon
Could you take a look and see if any CVE affects Spark directly? Let's stop just guessing around. Or you could open a vote. The general policy is already set down as I shared above. If you feel like the exception has to happen, let's start a vote officially >From my take, I don't think it's usual

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-30 Thread Sakthi
+1 (non-binding) On Fri, May 30, 2025 at 2:39 PM Jules Damji wrote: > +1 (non-binding) > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > On May 30, 2025, at 12:39 PM, Mark Hamstra wrote: > >  > > A soft real-time system still defines an interval or frame within which > results sho

Re: [DISCUSS] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-05-30 Thread Jules Damji
+1 (non-binding) —Sent from my iPhonePardon the dumb thumb typos :)On May 30, 2025, at 12:39 PM, Mark Hamstra wrote:A soft real-time system still defines an interval or frame within which results should be available, and often provides explicit warning or error-handling mechanisms when frame rate