Re: [DISCUSS] SPIP: Declarative Pipelines

Peter Toth Wed, 09 Apr 2025 00:07:52 -0700

+1

On Wed, Apr 9, 2025 at 8:51 AM Cheng Pan <[email protected]> wrote:


> +1 (non-binding)
>
> Glad to see Spark SQL extended to streaming use cases.
>
> Thanks,
> Cheng Pan
>
>
>
> On Apr 9, 2025, at 14:43, Anton Okolnychyi <[email protected]> wrote:
>
> +1
>
> вт, 8 квіт. 2025 р. о 23:36 Jacky Lee <[email protected]> пише:
>
>> +1 I'm delighted that it will be open-sourced, enabling greater
>> integration with Iceberg/Delta to unlock more value.
>>
>> Jungtaek Lim <[email protected]> 于2025年4月9日周三 10:47写道：
>> >
>> > +1 looking forward to seeing this make progress!
>> >
>> > On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]> wrote:
>> >>
>> >> +1
>> >>
>> >> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>> >> > +1
>> >> >
>> >> > I am actually pretty excited to have this. Happy to see this being
>> proposed.
>> >> >
>> >> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
>> >> >
>> >> > > +1. Super excited about this effort!
>> >> > >
>> >> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <[email protected]>
>> wrote:
>> >> > >
>> >> > >> +1 I support this SPIP because it simplifies data pipeline
>> management and
>> >> > >> enhances error detection.
>> >> > >>
>> >> > >>
>> >> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]>
>> wrote:
>> >> > >>
>> >> > >>> Excited to see this heading toward open source — materialized
>> views and
>> >> > >>> other features will bring a lot of value.
>> >> > >>> +1 (non-binding)
>> >> > >>>
>> >> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]>
>> wrote:
>> >> > >>>
>> >> > >>>> Hi Khalid – the CLI in the current proposal will need to be
>> built on
>> >> > >>>> top of internal APIs for constructing and launching pipeline
>> executions.
>> >> > >>>> We'll have the option to expose these in the future.
>> >> > >>>>
>> >> > >>>> It would be worthwhile to understand the use cases in more
>> depth before
>> >> > >>>> exposing these, because APIs are one-way doors and can be
>> costly to
>> >> > >>>> maintain.
>> >> > >>>>
>> >> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>> >> > >>>> [email protected]> wrote:
>> >> > >>>>
>> >> > >>>>> Looks great!
>> >> > >>>>> QQ: will user able to run this pipeline from normal code? I.e.
>> can I
>> >> > >>>>> trigger a pipeline from *driver* code based on some condition
>> etc. or
>> >> > >>>>> it must be executed via separate shell command ?
>> >> > >>>>> As a background Databricks imposes similar limitation where as
>> you
>> >> > >>>>> cannot run normal Spark code and DLT on the same cluster for
>> some reason
>> >> > >>>>> and forces to use two clusters increasing the cost and latency.
>> >> > >>>>>
>> >> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]>
>> wrote:
>> >> > >>>>>
>> >> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've
>> been
>> >> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie
>> Yang: [JIRA
>> >> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>> >> > >>>>>> <
>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>> >
>> >> > >>>>>> ].
>> >> > >>>>>>
>> >> > >>>>>> The SPIP proposes extending Spark's lazy, declarative
>> execution model
>> >> > >>>>>> beyond single queries, to pipelines that keep multiple
>> datasets up to date.
>> >> > >>>>>> It introduces the ability to compose multiple transformations
>> into a single
>> >> > >>>>>> declarative dataflow graph.
>> >> > >>>>>>
>> >> > >>>>>> Declarative pipelines aim to simplify the development and
>> management
>> >> > >>>>>> of data pipelines, by  removing the need for manual
>> orchestration of
>> >> > >>>>>> dependencies and making it possible to catch many errors
>> before any
>> >> > >>>>>> execution steps are launched.
>> >> > >>>>>>
>> >> > >>>>>> Declarative pipelines can include both batch and streaming
>> >> > >>>>>> computations, leveraging Structured Streaming for stream
>> processing and new
>> >> > >>>>>> materialized view syntax for batch processing. Tight
>> integration with Spark
>> >> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
>> detection than is
>> >> > >>>>>> achievable with more generic frameworks.
>> >> > >>>>>>
>> >> > >>>>>> Let us know what you think!
>> >> > >>>>>>
>> >> > >>>>>>
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: [email protected]
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>
>>
>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to