Re: [VOTE] SPIP: Declarative Pipelines

2025-04-10 Thread Xiao Li
+1 (binding)

On Thu, Apr 10, 2025 at 00:57 John Zhuge  wrote:

> +1 (non-binding)
>
> On Wed, Apr 9, 2025 at 9:11 PM Jacky Lee  wrote:
>
>> +1 (binding)
>>
>> Kent Yao  于2025年4月10日周四 12:00写道:
>> >
>> > +1 (binding)
>> >
>> > Kent Yao
>> >
>> > Yang Jie  于2025年4月10日周四 10:27写道:
>> >>
>> >> +1 (binding)
>> >>
>> >> On 2025/04/10 02:20:02 Cheng Pan wrote:
>> >> > +1 (non-binding)
>> >> >
>> >> > Thanks,
>> >> > Cheng Pan
>> >> >
>> >> >
>> >> >
>> >> > > On Apr 9, 2025, at 22:22, Sandy Ryza  wrote:
>> >> > >
>> >> > > We started to get some votes on the discussion thread, so I'd like
>> to move to a formal vote on adding support for declarative pipelines.
>> >> > >
>> >> > > *Discussion thread: *
>> https://lists.apache.org/thread/lsv8f829ps0bog41fjoqc45xk7m574ly
>> >> > > *SPIP:*
>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4
>> >> > > *JIRA:* https://issues.apache.org/jira/browse/SPARK-51727
>> >> > >
>> >> > > Please vote on the SPIP for the next 72 hours:
>> >> > >
>> >> > > [ ] +1: Accept the proposal as an official SPIP
>> >> > > [ ] +0
>> >> > > [ ] -1: I don’t think this is a good idea because …
>> >> > >
>> >> > > -Sandy
>> >> > >
>> >> >
>> >> >
>> >>
>> >> -
>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> John Zhuge
>


Re: [VOTE] SPIP: Declarative Pipelines

2025-04-10 Thread Prashant Singh
+1 (non-binding)

On Thu, Apr 10, 2025 at 9:46 AM Xiao Li  wrote:

> +1 (binding)
>
> On Thu, Apr 10, 2025 at 00:57 John Zhuge  wrote:
>
>> +1 (non-binding)
>>
>> On Wed, Apr 9, 2025 at 9:11 PM Jacky Lee  wrote:
>>
>>> +1 (binding)
>>>
>>> Kent Yao  于2025年4月10日周四 12:00写道:
>>> >
>>> > +1 (binding)
>>> >
>>> > Kent Yao
>>> >
>>> > Yang Jie  于2025年4月10日周四 10:27写道:
>>> >>
>>> >> +1 (binding)
>>> >>
>>> >> On 2025/04/10 02:20:02 Cheng Pan wrote:
>>> >> > +1 (non-binding)
>>> >> >
>>> >> > Thanks,
>>> >> > Cheng Pan
>>> >> >
>>> >> >
>>> >> >
>>> >> > > On Apr 9, 2025, at 22:22, Sandy Ryza  wrote:
>>> >> > >
>>> >> > > We started to get some votes on the discussion thread, so I'd
>>> like to move to a formal vote on adding support for declarative pipelines.
>>> >> > >
>>> >> > > *Discussion thread: *
>>> https://lists.apache.org/thread/lsv8f829ps0bog41fjoqc45xk7m574ly
>>> >> > > *SPIP:*
>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4
>>> >> > > *JIRA:* https://issues.apache.org/jira/browse/SPARK-51727
>>> >> > >
>>> >> > > Please vote on the SPIP for the next 72 hours:
>>> >> > >
>>> >> > > [ ] +1: Accept the proposal as an official SPIP
>>> >> > > [ ] +0
>>> >> > > [ ] -1: I don’t think this is a good idea because …
>>> >> > >
>>> >> > > -Sandy
>>> >> > >
>>> >> >
>>> >> >
>>> >>
>>> >> -
>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>> --
>> John Zhuge
>>
>


Re: [VOTE] SPIP: Declarative Pipelines

2025-04-10 Thread John Zhuge
+1 (non-binding)

On Wed, Apr 9, 2025 at 9:11 PM Jacky Lee  wrote:

> +1 (binding)
>
> Kent Yao  于2025年4月10日周四 12:00写道:
> >
> > +1 (binding)
> >
> > Kent Yao
> >
> > Yang Jie  于2025年4月10日周四 10:27写道:
> >>
> >> +1 (binding)
> >>
> >> On 2025/04/10 02:20:02 Cheng Pan wrote:
> >> > +1 (non-binding)
> >> >
> >> > Thanks,
> >> > Cheng Pan
> >> >
> >> >
> >> >
> >> > > On Apr 9, 2025, at 22:22, Sandy Ryza  wrote:
> >> > >
> >> > > We started to get some votes on the discussion thread, so I'd like
> to move to a formal vote on adding support for declarative pipelines.
> >> > >
> >> > > *Discussion thread: *
> https://lists.apache.org/thread/lsv8f829ps0bog41fjoqc45xk7m574ly
> >> > > *SPIP:*
> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4
> >> > > *JIRA:* https://issues.apache.org/jira/browse/SPARK-51727
> >> > >
> >> > > Please vote on the SPIP for the next 72 hours:
> >> > >
> >> > > [ ] +1: Accept the proposal as an official SPIP
> >> > > [ ] +0
> >> > > [ ] -1: I don’t think this is a good idea because …
> >> > >
> >> > > -Sandy
> >> > >
> >> >
> >> >
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
John Zhuge


Re: Security model update

2025-04-10 Thread Sean Owen
Sure, how about here though?
https://github.com/apache/spark-website/pull/602

On Mon, Apr 7, 2025 at 9:30 AM Arnout Engelen  wrote:

> On Mon, Apr 7, 2025 at 4:16 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> But I will note that that person’s reply to the ASF Security Team’s
>> initial comment smells like LLM output. Perhaps I am being unfair to them,
>> but I have read reports
>> 
>>  that
>> bug bounties are now getting flooded with credible-looking reports
>> generated by AI that simply waste a lot of developer time to check.
>>
>> And if that’s the case, then unfortunately some extra prose in the
>> Security guide is unlikely to help.
>>
>
> Yes and no: I agree that this report is particularly bad and likely
> LLM-generated. Nothing will prevent those. That said, having clear "this is
> how you decide whether the behaviour you see is problematic" instructions
> is still useful in swiftly dealing with those. And who knows a few may even
> learn something - we *have* also seen LLM-assisted reports that actually
> uncovered legitimate issues (though tbh I'd rather receive someone's broken
> English than their LLM's word salad...)
>
>
> Kind regards,
>
> Arnout
>
>
>> On Apr 7, 2025, at 9:59 AM, Arnout Engelen  wrote:
>>
>> Hello dev@spark,
>>
>> Every now and then we get a 'security report' for Spark where the
>> reporter is shocked that 'spark', an 'engine for executing', allows users
>> to execute things. The latest in this category was
>> https://huntr.com/bounties/cc436d0b-e5d7-4394-9cff-0d4b1809a3f8.
>>
>> You already have a pretty great
>> https://spark.apache.org/docs/latest/security.html, but it might be good
>> to add a basic introduction to make explicit that users who are authorized
>> to execute can indeed execute code? I'm of course no Spark expert and you
>> can likely more clearly describe the security boundaries here. You could
>> take inspiration from https://flink.apache.org/what-is-flink/security/
>> or other pages linked from https://security.apache.org/projects/
>>
>>
>> Kind regards,
>>
>> --
>> Arnout Engelen
>> ASF Security Response
>> Apache Pekko PMC member, ASF Member
>> NixOS Committer
>> Independent Open Source consultant
>>
>>
>>
>
> --
> Arnout Engelen
> ASF Security Response
> Apache Pekko PMC member, ASF Member
> NixOS Committer
> Independent Open Source consultant
>


[VOTE] Release Spark 4.0.0 (RC4)

2025-04-10 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version
4.0.0.

The vote is open until April 15 (PST) and passes if a majority +1 PMC votes
are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 4.0.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v4.0.0-rc4 (commit
e0801d9d8e33cd8835f3e3beed99a3588c16b776)
https://github.com/apache/spark/tree/v4.0.0-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1480/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-docs/

The list of bug fixes going into 4.0.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12353359

This release is using the release script of the tag v4.0.0-rc4.

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).


is someone else also seeing a hang in DataFrameSubquerySuite.simple uncorrelated scalar subquery - eom?

2025-04-10 Thread Asif Shahid



Re: [DISCUSS] SPIP: Declarative Pipelines

2025-04-10 Thread Sem
+1 (non-binding)

On April 9, 2025 7:29:40 AM GMT+02:00, Rishab Joshi  
wrote:
>+1 Exciting.
>Rishab Joshi
>
>On Tue, Apr 8, 2025, 10:04 PM Ruifeng Zheng  wrote:
>
>> +1
>>
>> On Wed, Apr 9, 2025 at 12:57 PM Denny Lee  wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Tue, Apr 8, 2025 at 9:53 PM Yuming Wang  wrote:
>>>
 +1

 On Wed, Apr 9, 2025 at 10:47 AM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> +1 looking forward to seeing this make progress!
>
> On Wed, Apr 9, 2025 at 11:32 AM Yang Jie  wrote:
>
>> +1
>>
>> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>> > +1
>> >
>> > I am actually pretty excited to have this. Happy to see this being
>> proposed.
>> >
>> > On Wed, 9 Apr 2025 at 01:55, Chao Sun  wrote:
>> >
>> > > +1. Super excited about this effort!
>> > >
>> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao 
>> wrote:
>> > >
>> > >> +1 I support this SPIP because it simplifies data pipeline
>> management and
>> > >> enhances error detection.
>> > >>
>> > >>
>> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal 
>> wrote:
>> > >>
>> > >>> Excited to see this heading toward open source — materialized
>> views and
>> > >>> other features will bring a lot of value.
>> > >>> +1 (non-binding)
>> > >>>
>> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza 
>> wrote:
>> > >>>
>> >  Hi Khalid – the CLI in the current proposal will need to be
>> built on
>> >  top of internal APIs for constructing and launching pipeline
>> executions.
>> >  We'll have the option to expose these in the future.
>> > 
>> >  It would be worthwhile to understand the use cases in more
>> depth before
>> >  exposing these, because APIs are one-way doors and can be
>> costly to
>> >  maintain.
>> > 
>> >  On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>> >  khalidmammad...@gmail.com> wrote:
>> > 
>> > > Looks great!
>> > > QQ: will user able to run this pipeline from normal code? I.e.
>> can I
>> > > trigger a pipeline from *driver* code based on some condition
>> etc. or
>> > > it must be executed via separate shell command ?
>> > > As a background Databricks imposes similar limitation where as
>> you
>> > > cannot run normal Spark code and DLT on the same cluster for
>> some reason
>> > > and forces to use two clusters increasing the cost and latency.
>> > >
>> > > On Sat, 5 Apr 2025 at 23:03, Sandy Ryza 
>> wrote:
>> > >
>> > >> Hi all – starting a discussion thread for a SPIP that I've
>> been
>> > >> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie
>> Yang: [JIRA
>> > >> ] [Doc
>> > >> <
>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>> >
>> > >> ].
>> > >>
>> > >> The SPIP proposes extending Spark's lazy, declarative
>> execution model
>> > >> beyond single queries, to pipelines that keep multiple
>> datasets up to date.
>> > >> It introduces the ability to compose multiple transformations
>> into a single
>> > >> declarative dataflow graph.
>> > >>
>> > >> Declarative pipelines aim to simplify the development and
>> management
>> > >> of data pipelines, by  removing the need for manual
>> orchestration of
>> > >> dependencies and making it possible to catch many errors
>> before any
>> > >> execution steps are launched.
>> > >>
>> > >> Declarative pipelines can include both batch and streaming
>> > >> computations, leveraging Structured Streaming for stream
>> processing and new
>> > >> materialized view syntax for batch processing. Tight
>> integration with Spark
>> > >> SQL's analyzer enables deeper analysis and earlier error
>> detection than is
>> > >> achievable with more generic frameworks.
>> > >>
>> > >> Let us know what you think!
>> > >>
>> > >>
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] SPIP: Declarative Pipelines

2025-04-10 Thread Liu Cao
+1 (non-binding)

On Thu, Apr 10, 2025 at 9:51 AM Prashant Singh 
wrote:

> +1 (non-binding)
>
> On Thu, Apr 10, 2025 at 9:46 AM Xiao Li  wrote:
>
>> +1 (binding)
>>
>> On Thu, Apr 10, 2025 at 00:57 John Zhuge  wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Wed, Apr 9, 2025 at 9:11 PM Jacky Lee  wrote:
>>>
 +1 (binding)

 Kent Yao  于2025年4月10日周四 12:00写道:
 >
 > +1 (binding)
 >
 > Kent Yao
 >
 > Yang Jie  于2025年4月10日周四 10:27写道:
 >>
 >> +1 (binding)
 >>
 >> On 2025/04/10 02:20:02 Cheng Pan wrote:
 >> > +1 (non-binding)
 >> >
 >> > Thanks,
 >> > Cheng Pan
 >> >
 >> >
 >> >
 >> > > On Apr 9, 2025, at 22:22, Sandy Ryza  wrote:
 >> > >
 >> > > We started to get some votes on the discussion thread, so I'd
 like to move to a formal vote on adding support for declarative pipelines.
 >> > >
 >> > > *Discussion thread: *
 https://lists.apache.org/thread/lsv8f829ps0bog41fjoqc45xk7m574ly
 >> > > *SPIP:*
 https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4
 >> > > *JIRA:* https://issues.apache.org/jira/browse/SPARK-51727
 >> > >
 >> > > Please vote on the SPIP for the next 72 hours:
 >> > >
 >> > > [ ] +1: Accept the proposal as an official SPIP
 >> > > [ ] +0
 >> > > [ ] -1: I don’t think this is a good idea because …
 >> > >
 >> > > -Sandy
 >> > >
 >> >
 >> >
 >>
 >> -
 >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
 >>

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>>
>>> --
>>> John Zhuge
>>>
>>

-- 

Liu Cao


Re: [DISCUSS] SPIP: Declarative Pipelines

2025-04-10 Thread Denny Lee
+1 (non-binding)

On Tue, Apr 8, 2025 at 9:53 PM Yuming Wang  wrote:

> +1
>
> On Wed, Apr 9, 2025 at 10:47 AM Jungtaek Lim 
> wrote:
>
>> +1 looking forward to seeing this make progress!
>>
>> On Wed, Apr 9, 2025 at 11:32 AM Yang Jie  wrote:
>>
>>> +1
>>>
>>> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>> > +1
>>> >
>>> > I am actually pretty excited to have this. Happy to see this being
>>> proposed.
>>> >
>>> > On Wed, 9 Apr 2025 at 01:55, Chao Sun  wrote:
>>> >
>>> > > +1. Super excited about this effort!
>>> > >
>>> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao 
>>> wrote:
>>> > >
>>> > >> +1 I support this SPIP because it simplifies data pipeline
>>> management and
>>> > >> enhances error detection.
>>> > >>
>>> > >>
>>> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal 
>>> wrote:
>>> > >>
>>> > >>> Excited to see this heading toward open source — materialized
>>> views and
>>> > >>> other features will bring a lot of value.
>>> > >>> +1 (non-binding)
>>> > >>>
>>> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza 
>>> wrote:
>>> > >>>
>>> >  Hi Khalid – the CLI in the current proposal will need to be built
>>> on
>>> >  top of internal APIs for constructing and launching pipeline
>>> executions.
>>> >  We'll have the option to expose these in the future.
>>> > 
>>> >  It would be worthwhile to understand the use cases in more depth
>>> before
>>> >  exposing these, because APIs are one-way doors and can be costly
>>> to
>>> >  maintain.
>>> > 
>>> >  On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>> >  khalidmammad...@gmail.com> wrote:
>>> > 
>>> > > Looks great!
>>> > > QQ: will user able to run this pipeline from normal code? I.e.
>>> can I
>>> > > trigger a pipeline from *driver* code based on some condition
>>> etc. or
>>> > > it must be executed via separate shell command ?
>>> > > As a background Databricks imposes similar limitation where as
>>> you
>>> > > cannot run normal Spark code and DLT on the same cluster for
>>> some reason
>>> > > and forces to use two clusters increasing the cost and latency.
>>> > >
>>> > > On Sat, 5 Apr 2025 at 23:03, Sandy Ryza 
>>> wrote:
>>> > >
>>> > >> Hi all – starting a discussion thread for a SPIP that I've been
>>> > >> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang:
>>> [JIRA
>>> > >> ] [Doc
>>> > >> <
>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>> >
>>> > >> ].
>>> > >>
>>> > >> The SPIP proposes extending Spark's lazy, declarative execution
>>> model
>>> > >> beyond single queries, to pipelines that keep multiple datasets
>>> up to date.
>>> > >> It introduces the ability to compose multiple transformations
>>> into a single
>>> > >> declarative dataflow graph.
>>> > >>
>>> > >> Declarative pipelines aim to simplify the development and
>>> management
>>> > >> of data pipelines, by  removing the need for manual
>>> orchestration of
>>> > >> dependencies and making it possible to catch many errors before
>>> any
>>> > >> execution steps are launched.
>>> > >>
>>> > >> Declarative pipelines can include both batch and streaming
>>> > >> computations, leveraging Structured Streaming for stream
>>> processing and new
>>> > >> materialized view syntax for batch processing. Tight
>>> integration with Spark
>>> > >> SQL's analyzer enables deeper analysis and earlier error
>>> detection than is
>>> > >> achievable with more generic frameworks.
>>> > >>
>>> > >> Let us know what you think!
>>> > >>
>>> > >>
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [DISCUSS] SPIP: Declarative Pipelines

2025-04-10 Thread Sandy Ryza
Hi Khalid – the CLI in the current proposal will need to be built on top of
internal APIs for constructing and launching pipeline executions. We'll
have the option to expose these in the future.

It would be worthwhile to understand the use cases in more depth before
exposing these, because APIs are one-way doors and can be costly to
maintain.

On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov 
wrote:

> Looks great!
> QQ: will user able to run this pipeline from normal code? I.e. can I
> trigger a pipeline from *driver* code based on some condition etc. or it
> must be executed via separate shell command ?
> As a background Databricks imposes similar limitation where as you cannot
> run normal Spark code and DLT on the same cluster for some reason and
> forces to use two clusters increasing the cost and latency.
>
> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza  wrote:
>
>> Hi all – starting a discussion thread for a SPIP that I've been working
>> on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA
>> ] [Doc
>> 
>> ].
>>
>> The SPIP proposes extending Spark's lazy, declarative execution model
>> beyond single queries, to pipelines that keep multiple datasets up to date.
>> It introduces the ability to compose multiple transformations into a single
>> declarative dataflow graph.
>>
>> Declarative pipelines aim to simplify the development and management of
>> data pipelines, by  removing the need for manual orchestration of
>> dependencies and making it possible to catch many errors before any
>> execution steps are launched.
>>
>> Declarative pipelines can include both batch and streaming computations,
>> leveraging Structured Streaming for stream processing and new materialized
>> view syntax for batch processing. Tight integration with Spark SQL's
>> analyzer enables deeper analysis and earlier error detection than is
>> achievable with more generic frameworks.
>>
>> Let us know what you think!
>>
>>


Re: [VOTE] SPIP: Declarative Pipelines

2025-04-10 Thread Walaa Eldin Moustafa
+1 (non-binding)

On Thu, Apr 10, 2025 at 6:52 PM Liu Cao  wrote:

> +1 (non-binding)
>
> On Thu, Apr 10, 2025 at 9:51 AM Prashant Singh 
> wrote:
>
>> +1 (non-binding)
>>
>> On Thu, Apr 10, 2025 at 9:46 AM Xiao Li  wrote:
>>
>>> +1 (binding)
>>>
>>> On Thu, Apr 10, 2025 at 00:57 John Zhuge  wrote:
>>>
 +1 (non-binding)

 On Wed, Apr 9, 2025 at 9:11 PM Jacky Lee  wrote:

> +1 (binding)
>
> Kent Yao  于2025年4月10日周四 12:00写道:
> >
> > +1 (binding)
> >
> > Kent Yao
> >
> > Yang Jie  于2025年4月10日周四 10:27写道:
> >>
> >> +1 (binding)
> >>
> >> On 2025/04/10 02:20:02 Cheng Pan wrote:
> >> > +1 (non-binding)
> >> >
> >> > Thanks,
> >> > Cheng Pan
> >> >
> >> >
> >> >
> >> > > On Apr 9, 2025, at 22:22, Sandy Ryza  wrote:
> >> > >
> >> > > We started to get some votes on the discussion thread, so I'd
> like to move to a formal vote on adding support for declarative pipelines.
> >> > >
> >> > > *Discussion thread: *
> https://lists.apache.org/thread/lsv8f829ps0bog41fjoqc45xk7m574ly
> >> > > *SPIP:*
> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4
> >> > > *JIRA:* https://issues.apache.org/jira/browse/SPARK-51727
> >> > >
> >> > > Please vote on the SPIP for the next 72 hours:
> >> > >
> >> > > [ ] +1: Accept the proposal as an official SPIP
> >> > > [ ] +0
> >> > > [ ] -1: I don’t think this is a good idea because …
> >> > >
> >> > > -Sandy
> >> > >
> >> >
> >> >
> >>
> >>
> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

 --
 John Zhuge

>>>
>
> --
>
> Liu Cao
>
>
>


Re: [DISCUSS] SPIP: Declarative Pipelines

2025-04-10 Thread Walaa Eldin Moustafa
This sounds quite interesting.

+1 to What Szheon said about excitement around MVs. Happy to collaborate.

On Wed, Apr 9, 2025 at 5:29 PM Ángel Álvarez Pascua <
angel.alvarez.pas...@gmail.com> wrote:

> +1 (non-binding)
>
> El jue, 10 abr 2025, 1:50, Burak Yavuz  escribió:
>
>> +1
>>
>> On Wed, Apr 9, 2025 at 4:33 PM Szehon Ho  wrote:
>>
>>> +1 really excited to finally see Materialized View finally make its way
>>> to Spark, as many other ecosystem projects (Trino, Starrocks, soon Iceberg)
>>> already supporting it.
>>>
>>> Thanks
>>> Szehon
>>>
>>> On Wed, Apr 9, 2025 at 2:33 AM Martin Grund
>>>  wrote:
>>>
 +1

 On Wed, Apr 9, 2025 at 9:37 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> +1
>
> Dr Mich Talebzadeh,
> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>
>view my Linkedin profile
> 
>
>
>
>
>
> On Wed, 9 Apr 2025 at 08:07, Peter Toth  wrote:
>
>> +1
>>
>> On Wed, Apr 9, 2025 at 8:51 AM Cheng Pan  wrote:
>>
>>> +1 (non-binding)
>>>
>>> Glad to see Spark SQL extended to streaming use cases.
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>>
>>> On Apr 9, 2025, at 14:43, Anton Okolnychyi 
>>> wrote:
>>>
>>> +1
>>>
>>> вт, 8 квіт. 2025 р. о 23:36 Jacky Lee  пише:
>>>
 +1 I'm delighted that it will be open-sourced, enabling greater
 integration with Iceberg/Delta to unlock more value.

 Jungtaek Lim  于2025年4月9日周三 10:47写道:
 >
 > +1 looking forward to seeing this make progress!
 >
 > On Wed, Apr 9, 2025 at 11:32 AM Yang Jie 
 wrote:
 >>
 >> +1
 >>
 >> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
 >> > +1
 >> >
 >> > I am actually pretty excited to have this. Happy to see this
 being proposed.
 >> >
 >> > On Wed, 9 Apr 2025 at 01:55, Chao Sun 
 wrote:
 >> >
 >> > > +1. Super excited about this effort!
 >> > >
 >> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <
 huaxin.ga...@gmail.com> wrote:
 >> > >
 >> > >> +1 I support this SPIP because it simplifies data pipeline
 management and
 >> > >> enhances error detection.
 >> > >>
 >> > >>
 >> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <
 dkbis...@gmail.com> wrote:
 >> > >>
 >> > >>> Excited to see this heading toward open source —
 materialized views and
 >> > >>> other features will bring a lot of value.
 >> > >>> +1 (non-binding)
 >> > >>>
 >> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <
 sa...@apache.org> wrote:
 >> > >>>
 >> >  Hi Khalid – the CLI in the current proposal will need to
 be built on
 >> >  top of internal APIs for constructing and launching
 pipeline executions.
 >> >  We'll have the option to expose these in the future.
 >> > 
 >> >  It would be worthwhile to understand the use cases in
 more depth before
 >> >  exposing these, because APIs are one-way doors and can be
 costly to
 >> >  maintain.
 >> > 
 >> >  On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
 >> >  khalidmammad...@gmail.com> wrote:
 >> > 
 >> > > Looks great!
 >> > > QQ: will user able to run this pipeline from normal
 code? I.e. can I
 >> > > trigger a pipeline from *driver* code based on some
 condition etc. or
 >> > > it must be executed via separate shell command ?
 >> > > As a background Databricks imposes similar limitation
 where as you
 >> > > cannot run normal Spark code and DLT on the same cluster
 for some reason
 >> > > and forces to use two clusters increasing the cost and
 latency.
 >> > >
 >> > > On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <
 sa...@apache.org> wrote:
 >> > >
 >> > >> Hi all – starting a discussion thread for a SPIP that
 I've been
 >> > >> working on with Chao Sun, Kent Yao, Yuming Wang, and
 Jie Yang: [JIRA
 >> > >> ]
 [Doc
 >> > >> <
 https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
 >
 >> > >> ].
 >> > >>
 >> > >> The SPIP proposes extending Spark's lazy, declarative
 execution model
 >> > >> beyond single queries, to pipelines that keep multiple
 datasets up to date.
 >> > >> It int

Re: [VOTE] SPIP: Declarative Pipelines

2025-04-10 Thread Kent Yao
+1 (binding)

Kent Yao

Yang Jie  于2025年4月10日周四 10:27写道:

> +1 (binding)
>
> On 2025/04/10 02:20:02 Cheng Pan wrote:
> > +1 (non-binding)
> >
> > Thanks,
> > Cheng Pan
> >
> >
> >
> > > On Apr 9, 2025, at 22:22, Sandy Ryza  wrote:
> > >
> > > We started to get some votes on the discussion thread, so I'd like to
> move to a formal vote on adding support for declarative pipelines.
> > >
> > > *Discussion thread: *
> https://lists.apache.org/thread/lsv8f829ps0bog41fjoqc45xk7m574ly
> > > *SPIP:*
> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4
> > > *JIRA:* https://issues.apache.org/jira/browse/SPARK-51727
> > >
> > > Please vote on the SPIP for the next 72 hours:
> > >
> > > [ ] +1: Accept the proposal as an official SPIP
> > > [ ] +0
> > > [ ] -1: I don’t think this is a good idea because …
> > >
> > > -Sandy
> > >
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] SPIP: Declarative Pipelines

2025-04-10 Thread Jungtaek Lim
+1 looking forward to seeing this make progress!

On Wed, Apr 9, 2025 at 11:32 AM Yang Jie  wrote:

> +1
>
> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
> > +1
> >
> > I am actually pretty excited to have this. Happy to see this being
> proposed.
> >
> > On Wed, 9 Apr 2025 at 01:55, Chao Sun  wrote:
> >
> > > +1. Super excited about this effort!
> > >
> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao 
> wrote:
> > >
> > >> +1 I support this SPIP because it simplifies data pipeline management
> and
> > >> enhances error detection.
> > >>
> > >>
> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal 
> wrote:
> > >>
> > >>> Excited to see this heading toward open source — materialized views
> and
> > >>> other features will bring a lot of value.
> > >>> +1 (non-binding)
> > >>>
> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza  wrote:
> > >>>
> >  Hi Khalid – the CLI in the current proposal will need to be built on
> >  top of internal APIs for constructing and launching pipeline
> executions.
> >  We'll have the option to expose these in the future.
> > 
> >  It would be worthwhile to understand the use cases in more depth
> before
> >  exposing these, because APIs are one-way doors and can be costly to
> >  maintain.
> > 
> >  On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
> >  khalidmammad...@gmail.com> wrote:
> > 
> > > Looks great!
> > > QQ: will user able to run this pipeline from normal code? I.e. can
> I
> > > trigger a pipeline from *driver* code based on some condition etc.
> or
> > > it must be executed via separate shell command ?
> > > As a background Databricks imposes similar limitation where as you
> > > cannot run normal Spark code and DLT on the same cluster for some
> reason
> > > and forces to use two clusters increasing the cost and latency.
> > >
> > > On Sat, 5 Apr 2025 at 23:03, Sandy Ryza  wrote:
> > >
> > >> Hi all – starting a discussion thread for a SPIP that I've been
> > >> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang:
> [JIRA
> > >> ] [Doc
> > >> <
> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
> >
> > >> ].
> > >>
> > >> The SPIP proposes extending Spark's lazy, declarative execution
> model
> > >> beyond single queries, to pipelines that keep multiple datasets
> up to date.
> > >> It introduces the ability to compose multiple transformations
> into a single
> > >> declarative dataflow graph.
> > >>
> > >> Declarative pipelines aim to simplify the development and
> management
> > >> of data pipelines, by  removing the need for manual orchestration
> of
> > >> dependencies and making it possible to catch many errors before
> any
> > >> execution steps are launched.
> > >>
> > >> Declarative pipelines can include both batch and streaming
> > >> computations, leveraging Structured Streaming for stream
> processing and new
> > >> materialized view syntax for batch processing. Tight integration
> with Spark
> > >> SQL's analyzer enables deeper analysis and earlier error
> detection than is
> > >> achievable with more generic frameworks.
> > >>
> > >> Let us know what you think!
> > >>
> > >>
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>