Re: [DISCUSS] SPIP: Declarative Pipelines

Ángel Álvarez Pascua Wed, 09 Apr 2025 17:48:49 -0700

+1 (non-binding)

El jue, 10 abr 2025, 1:50, Burak Yavuz <[email protected]> escribió:


> +1
>
> On Wed, Apr 9, 2025 at 4:33 PM Szehon Ho <[email protected]> wrote:
>
>> +1 really excited to finally see Materialized View finally make its way
>> to Spark, as many other ecosystem projects (Trino, Starrocks, soon Iceberg)
>> already supporting it.
>>
>> Thanks
>> Szehon
>>
>> On Wed, Apr 9, 2025 at 2:33 AM Martin Grund <[email protected]>
>> wrote:
>>
>>> +1
>>>
>>> On Wed, Apr 9, 2025 at 9:37 AM Mich Talebzadeh <
>>> [email protected]> wrote:
>>>
>>>> +1
>>>>
>>>> Dr Mich Talebzadeh,
>>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, 9 Apr 2025 at 08:07, Peter Toth <[email protected]> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On Wed, Apr 9, 2025 at 8:51 AM Cheng Pan <[email protected]> wrote:
>>>>>
>>>>>> +1 (non-binding)
>>>>>>
>>>>>> Glad to see Spark SQL extended to streaming use cases.
>>>>>>
>>>>>> Thanks,
>>>>>> Cheng Pan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Apr 9, 2025, at 14:43, Anton Okolnychyi <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> вт, 8 квіт. 2025 р. о 23:36 Jacky Lee <[email protected]> пише:
>>>>>>
>>>>>>> +1 I'm delighted that it will be open-sourced, enabling greater
>>>>>>> integration with Iceberg/Delta to unlock more value.
>>>>>>>
>>>>>>> Jungtaek Lim <[email protected]> 于2025年4月9日周三 10:47写道：
>>>>>>> >
>>>>>>> > +1 looking forward to seeing this make progress!
>>>>>>> >
>>>>>>> > On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> +1
>>>>>>> >>
>>>>>>> >> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>>>>>> >> > +1
>>>>>>> >> >
>>>>>>> >> > I am actually pretty excited to have this. Happy to see this
>>>>>>> being proposed.
>>>>>>> >> >
>>>>>>> >> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]>
>>>>>>> wrote:
>>>>>>> >> >
>>>>>>> >> > > +1. Super excited about this effort!
>>>>>>> >> > >
>>>>>>> >> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <
>>>>>>> [email protected]> wrote:
>>>>>>> >> > >
>>>>>>> >> > >> +1 I support this SPIP because it simplifies data pipeline
>>>>>>> management and
>>>>>>> >> > >> enhances error detection.
>>>>>>> >> > >>
>>>>>>> >> > >>
>>>>>>> >> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <
>>>>>>> [email protected]> wrote:
>>>>>>> >> > >>
>>>>>>> >> > >>> Excited to see this heading toward open source —
>>>>>>> materialized views and
>>>>>>> >> > >>> other features will bring a lot of value.
>>>>>>> >> > >>> +1 (non-binding)
>>>>>>> >> > >>>
>>>>>>> >> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <
>>>>>>> [email protected]> wrote:
>>>>>>> >> > >>>
>>>>>>> >> > >>>> Hi Khalid – the CLI in the current proposal will need to
>>>>>>> be built on
>>>>>>> >> > >>>> top of internal APIs for constructing and launching
>>>>>>> pipeline executions.
>>>>>>> >> > >>>> We'll have the option to expose these in the future.
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> It would be worthwhile to understand the use cases in more
>>>>>>> depth before
>>>>>>> >> > >>>> exposing these, because APIs are one-way doors and can be
>>>>>>> costly to
>>>>>>> >> > >>>> maintain.
>>>>>>> >> > >>>>
>>>>>>> >> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>>>>>> >> > >>>> [email protected]> wrote:
>>>>>>> >> > >>>>
>>>>>>> >> > >>>>> Looks great!
>>>>>>> >> > >>>>> QQ: will user able to run this pipeline from normal code?
>>>>>>> I.e. can I
>>>>>>> >> > >>>>> trigger a pipeline from *driver* code based on some
>>>>>>> condition etc. or
>>>>>>> >> > >>>>> it must be executed via separate shell command ?
>>>>>>> >> > >>>>> As a background Databricks imposes similar limitation
>>>>>>> where as you
>>>>>>> >> > >>>>> cannot run normal Spark code and DLT on the same cluster
>>>>>>> for some reason
>>>>>>> >> > >>>>> and forces to use two clusters increasing the cost and
>>>>>>> latency.
>>>>>>> >> > >>>>>
>>>>>>> >> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]>
>>>>>>> wrote:
>>>>>>> >> > >>>>>
>>>>>>> >> > >>>>>> Hi all – starting a discussion thread for a SPIP that
>>>>>>> I've been
>>>>>>> >> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie
>>>>>>> Yang: [JIRA
>>>>>>> >> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>]
>>>>>>> [Doc
>>>>>>> >> > >>>>>> <
>>>>>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>>>>>> >
>>>>>>> >> > >>>>>> ].
>>>>>>> >> > >>>>>>
>>>>>>> >> > >>>>>> The SPIP proposes extending Spark's lazy, declarative
>>>>>>> execution model
>>>>>>> >> > >>>>>> beyond single queries, to pipelines that keep multiple
>>>>>>> datasets up to date.
>>>>>>> >> > >>>>>> It introduces the ability to compose multiple
>>>>>>> transformations into a single
>>>>>>> >> > >>>>>> declarative dataflow graph.
>>>>>>> >> > >>>>>>
>>>>>>> >> > >>>>>> Declarative pipelines aim to simplify the development
>>>>>>> and management
>>>>>>> >> > >>>>>> of data pipelines, by  removing the need for manual
>>>>>>> orchestration of
>>>>>>> >> > >>>>>> dependencies and making it possible to catch many errors
>>>>>>> before any
>>>>>>> >> > >>>>>> execution steps are launched.
>>>>>>> >> > >>>>>>
>>>>>>> >> > >>>>>> Declarative pipelines can include both batch and
>>>>>>> streaming
>>>>>>> >> > >>>>>> computations, leveraging Structured Streaming for stream
>>>>>>> processing and new
>>>>>>> >> > >>>>>> materialized view syntax for batch processing. Tight
>>>>>>> integration with Spark
>>>>>>> >> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
>>>>>>> detection than is
>>>>>>> >> > >>>>>> achievable with more generic frameworks.
>>>>>>> >> > >>>>>>
>>>>>>> >> > >>>>>> Let us know what you think!
>>>>>>> >> > >>>>>>
>>>>>>> >> > >>>>>>
>>>>>>> >> >
>>>>>>> >>
>>>>>>> >>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> >> To unsubscribe e-mail: [email protected]
>>>>>>> >>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>>
>>>>>>>
>>>>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to