Re: [DISCUSS] SPIP: Declarative Pipelines

Cheng Pan Tue, 08 Apr 2025 23:51:22 -0700

+1 (non-binding)

Glad to see Spark SQL extended to streaming use cases.


Thanks,
Cheng Pan



> On Apr 9, 2025, at 14:43, Anton Okolnychyi <[email protected]> wrote:
> 
> +1
> 
> вт, 8 квіт. 2025 р. о 23:36 Jacky Lee <[email protected] 
> <mailto:[email protected]>> пише:
>> +1 I'm delighted that it will be open-sourced, enabling greater
>> integration with Iceberg/Delta to unlock more value.
>> 
>> Jungtaek Lim <[email protected] 
>> <mailto:[email protected]>> 于2025年4月9日周三 10:47写道：
>> >
>> > +1 looking forward to seeing this make progress!
>> >
>> > On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected] 
>> > <mailto:[email protected]>> wrote:
>> >>
>> >> +1
>> >>
>> >> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>> >> > +1
>> >> >
>> >> > I am actually pretty excited to have this. Happy to see this being 
>> >> > proposed.
>> >> >
>> >> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected] 
>> >> > <mailto:[email protected]>> wrote:
>> >> >
>> >> > > +1. Super excited about this effort!
>> >> > >
>> >> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <[email protected] 
>> >> > > <mailto:[email protected]>> wrote:
>> >> > >
>> >> > >> +1 I support this SPIP because it simplifies data pipeline 
>> >> > >> management and
>> >> > >> enhances error detection.
>> >> > >>
>> >> > >>
>> >> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected] 
>> >> > >> <mailto:[email protected]>> wrote:
>> >> > >>
>> >> > >>> Excited to see this heading toward open source — materialized views 
>> >> > >>> and
>> >> > >>> other features will bring a lot of value.
>> >> > >>> +1 (non-binding)
>> >> > >>>
>> >> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected] 
>> >> > >>> <mailto:[email protected]>> wrote:
>> >> > >>>
>> >> > >>>> Hi Khalid – the CLI in the current proposal will need to be built 
>> >> > >>>> on
>> >> > >>>> top of internal APIs for constructing and launching pipeline 
>> >> > >>>> executions.
>> >> > >>>> We'll have the option to expose these in the future.
>> >> > >>>>
>> >> > >>>> It would be worthwhile to understand the use cases in more depth 
>> >> > >>>> before
>> >> > >>>> exposing these, because APIs are one-way doors and can be costly to
>> >> > >>>> maintain.
>> >> > >>>>
>> >> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>> >> > >>>> [email protected] <mailto:[email protected]>> 
>> >> > >>>> wrote:
>> >> > >>>>
>> >> > >>>>> Looks great!
>> >> > >>>>> QQ: will user able to run this pipeline from normal code? I.e. 
>> >> > >>>>> can I
>> >> > >>>>> trigger a pipeline from *driver* code based on some condition 
>> >> > >>>>> etc. or
>> >> > >>>>> it must be executed via separate shell command ?
>> >> > >>>>> As a background Databricks imposes similar limitation where as you
>> >> > >>>>> cannot run normal Spark code and DLT on the same cluster for some 
>> >> > >>>>> reason
>> >> > >>>>> and forces to use two clusters increasing the cost and latency.
>> >> > >>>>>
>> >> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected] 
>> >> > >>>>> <mailto:[email protected]>> wrote:
>> >> > >>>>>
>> >> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've been
>> >> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: 
>> >> > >>>>>> [JIRA
>> >> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>> >> > >>>>>> <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0>
>> >> > >>>>>> ].
>> >> > >>>>>>
>> >> > >>>>>> The SPIP proposes extending Spark's lazy, declarative execution 
>> >> > >>>>>> model
>> >> > >>>>>> beyond single queries, to pipelines that keep multiple datasets 
>> >> > >>>>>> up to date.
>> >> > >>>>>> It introduces the ability to compose multiple transformations 
>> >> > >>>>>> into a single
>> >> > >>>>>> declarative dataflow graph.
>> >> > >>>>>>
>> >> > >>>>>> Declarative pipelines aim to simplify the development and 
>> >> > >>>>>> management
>> >> > >>>>>> of data pipelines, by  removing the need for manual 
>> >> > >>>>>> orchestration of
>> >> > >>>>>> dependencies and making it possible to catch many errors before 
>> >> > >>>>>> any
>> >> > >>>>>> execution steps are launched.
>> >> > >>>>>>
>> >> > >>>>>> Declarative pipelines can include both batch and streaming
>> >> > >>>>>> computations, leveraging Structured Streaming for stream 
>> >> > >>>>>> processing and new
>> >> > >>>>>> materialized view syntax for batch processing. Tight integration 
>> >> > >>>>>> with Spark
>> >> > >>>>>> SQL's analyzer enables deeper analysis and earlier error 
>> >> > >>>>>> detection than is
>> >> > >>>>>> achievable with more generic frameworks.
>> >> > >>>>>>
>> >> > >>>>>> Let us know what you think!
>> >> > >>>>>>
>> >> > >>>>>>
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: [email protected] 
>> >> <mailto:[email protected]>
>> >>
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected] 
>> <mailto:[email protected]>
>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to