Re: [DISCUSS] SPIP: Declarative Pipelines

Szehon Ho Wed, 09 Apr 2025 16:30:05 -0700

+1 really excited to finally see Materialized View finally make its way to
Spark, as many other ecosystem projects (Trino, Starrocks, soon Iceberg)
already supporting it.


Thanks
Szehon

On Wed, Apr 9, 2025 at 2:33 AM Martin Grund <[email protected]>
wrote:

> +1
>
> On Wed, Apr 9, 2025 at 9:37 AM Mich Talebzadeh <[email protected]>
> wrote:
>
>> +1
>>
>> Dr Mich Talebzadeh,
>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>>
>>
>> On Wed, 9 Apr 2025 at 08:07, Peter Toth <[email protected]> wrote:
>>
>>> +1
>>>
>>> On Wed, Apr 9, 2025 at 8:51 AM Cheng Pan <[email protected]> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> Glad to see Spark SQL extended to streaming use cases.
>>>>
>>>> Thanks,
>>>> Cheng Pan
>>>>
>>>>
>>>>
>>>> On Apr 9, 2025, at 14:43, Anton Okolnychyi <[email protected]>
>>>> wrote:
>>>>
>>>> +1
>>>>
>>>> вт, 8 квіт. 2025 р. о 23:36 Jacky Lee <[email protected]> пише:
>>>>
>>>>> +1 I'm delighted that it will be open-sourced, enabling greater
>>>>> integration with Iceberg/Delta to unlock more value.
>>>>>
>>>>> Jungtaek Lim <[email protected]> 于2025年4月9日周三 10:47写道：
>>>>> >
>>>>> > +1 looking forward to seeing this make progress!
>>>>> >
>>>>> > On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]>
>>>>> wrote:
>>>>> >>
>>>>> >> +1
>>>>> >>
>>>>> >> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>>>> >> > +1
>>>>> >> >
>>>>> >> > I am actually pretty excited to have this. Happy to see this
>>>>> being proposed.
>>>>> >> >
>>>>> >> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
>>>>> >> >
>>>>> >> > > +1. Super excited about this effort!
>>>>> >> > >
>>>>> >> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <
>>>>> [email protected]> wrote:
>>>>> >> > >
>>>>> >> > >> +1 I support this SPIP because it simplifies data pipeline
>>>>> management and
>>>>> >> > >> enhances error detection.
>>>>> >> > >>
>>>>> >> > >>
>>>>> >> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <
>>>>> [email protected]> wrote:
>>>>> >> > >>
>>>>> >> > >>> Excited to see this heading toward open source — materialized
>>>>> views and
>>>>> >> > >>> other features will bring a lot of value.
>>>>> >> > >>> +1 (non-binding)
>>>>> >> > >>>
>>>>> >> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]>
>>>>> wrote:
>>>>> >> > >>>
>>>>> >> > >>>> Hi Khalid – the CLI in the current proposal will need to be
>>>>> built on
>>>>> >> > >>>> top of internal APIs for constructing and launching pipeline
>>>>> executions.
>>>>> >> > >>>> We'll have the option to expose these in the future.
>>>>> >> > >>>>
>>>>> >> > >>>> It would be worthwhile to understand the use cases in more
>>>>> depth before
>>>>> >> > >>>> exposing these, because APIs are one-way doors and can be
>>>>> costly to
>>>>> >> > >>>> maintain.
>>>>> >> > >>>>
>>>>> >> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>>>> >> > >>>> [email protected]> wrote:
>>>>> >> > >>>>
>>>>> >> > >>>>> Looks great!
>>>>> >> > >>>>> QQ: will user able to run this pipeline from normal code?
>>>>> I.e. can I
>>>>> >> > >>>>> trigger a pipeline from *driver* code based on some
>>>>> condition etc. or
>>>>> >> > >>>>> it must be executed via separate shell command ?
>>>>> >> > >>>>> As a background Databricks imposes similar limitation where
>>>>> as you
>>>>> >> > >>>>> cannot run normal Spark code and DLT on the same cluster
>>>>> for some reason
>>>>> >> > >>>>> and forces to use two clusters increasing the cost and
>>>>> latency.
>>>>> >> > >>>>>
>>>>> >> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]>
>>>>> wrote:
>>>>> >> > >>>>>
>>>>> >> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've
>>>>> been
>>>>> >> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie
>>>>> Yang: [JIRA
>>>>> >> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>>>> >> > >>>>>> <
>>>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>>>> >
>>>>> >> > >>>>>> ].
>>>>> >> > >>>>>>
>>>>> >> > >>>>>> The SPIP proposes extending Spark's lazy, declarative
>>>>> execution model
>>>>> >> > >>>>>> beyond single queries, to pipelines that keep multiple
>>>>> datasets up to date.
>>>>> >> > >>>>>> It introduces the ability to compose multiple
>>>>> transformations into a single
>>>>> >> > >>>>>> declarative dataflow graph.
>>>>> >> > >>>>>>
>>>>> >> > >>>>>> Declarative pipelines aim to simplify the development and
>>>>> management
>>>>> >> > >>>>>> of data pipelines, by  removing the need for manual
>>>>> orchestration of
>>>>> >> > >>>>>> dependencies and making it possible to catch many errors
>>>>> before any
>>>>> >> > >>>>>> execution steps are launched.
>>>>> >> > >>>>>>
>>>>> >> > >>>>>> Declarative pipelines can include both batch and streaming
>>>>> >> > >>>>>> computations, leveraging Structured Streaming for stream
>>>>> processing and new
>>>>> >> > >>>>>> materialized view syntax for batch processing. Tight
>>>>> integration with Spark
>>>>> >> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
>>>>> detection than is
>>>>> >> > >>>>>> achievable with more generic frameworks.
>>>>> >> > >>>>>>
>>>>> >> > >>>>>> Let us know what you think!
>>>>> >> > >>>>>>
>>>>> >> > >>>>>>
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> ---------------------------------------------------------------------
>>>>> >> To unsubscribe e-mail: [email protected]
>>>>> >>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: [email protected]
>>>>>
>>>>>
>>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to