Re: [DISCUSS] SPIP: Declarative Pipelines

Martin Grund Wed, 09 Apr 2025 03:09:31 -0700

+1

On Wed, Apr 9, 2025 at 9:37 AM Mich Talebzadeh <[email protected]>
wrote:


> +1
>
> Dr Mich Talebzadeh,
> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>
>
> On Wed, 9 Apr 2025 at 08:07, Peter Toth <[email protected]> wrote:
>
>> +1
>>
>> On Wed, Apr 9, 2025 at 8:51 AM Cheng Pan <[email protected]> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Glad to see Spark SQL extended to streaming use cases.
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>>
>>> On Apr 9, 2025, at 14:43, Anton Okolnychyi <[email protected]>
>>> wrote:
>>>
>>> +1
>>>
>>> вт, 8 квіт. 2025 р. о 23:36 Jacky Lee <[email protected]> пише:
>>>
>>>> +1 I'm delighted that it will be open-sourced, enabling greater
>>>> integration with Iceberg/Delta to unlock more value.
>>>>
>>>> Jungtaek Lim <[email protected]> 于2025年4月9日周三 10:47写道：
>>>> >
>>>> > +1 looking forward to seeing this make progress!
>>>> >
>>>> > On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]>
>>>> wrote:
>>>> >>
>>>> >> +1
>>>> >>
>>>> >> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>>> >> > +1
>>>> >> >
>>>> >> > I am actually pretty excited to have this. Happy to see this being
>>>> proposed.
>>>> >> >
>>>> >> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
>>>> >> >
>>>> >> > > +1. Super excited about this effort!
>>>> >> > >
>>>> >> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <
>>>> [email protected]> wrote:
>>>> >> > >
>>>> >> > >> +1 I support this SPIP because it simplifies data pipeline
>>>> management and
>>>> >> > >> enhances error detection.
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]>
>>>> wrote:
>>>> >> > >>
>>>> >> > >>> Excited to see this heading toward open source — materialized
>>>> views and
>>>> >> > >>> other features will bring a lot of value.
>>>> >> > >>> +1 (non-binding)
>>>> >> > >>>
>>>> >> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]>
>>>> wrote:
>>>> >> > >>>
>>>> >> > >>>> Hi Khalid – the CLI in the current proposal will need to be
>>>> built on
>>>> >> > >>>> top of internal APIs for constructing and launching pipeline
>>>> executions.
>>>> >> > >>>> We'll have the option to expose these in the future.
>>>> >> > >>>>
>>>> >> > >>>> It would be worthwhile to understand the use cases in more
>>>> depth before
>>>> >> > >>>> exposing these, because APIs are one-way doors and can be
>>>> costly to
>>>> >> > >>>> maintain.
>>>> >> > >>>>
>>>> >> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>>> >> > >>>> [email protected]> wrote:
>>>> >> > >>>>
>>>> >> > >>>>> Looks great!
>>>> >> > >>>>> QQ: will user able to run this pipeline from normal code?
>>>> I.e. can I
>>>> >> > >>>>> trigger a pipeline from *driver* code based on some
>>>> condition etc. or
>>>> >> > >>>>> it must be executed via separate shell command ?
>>>> >> > >>>>> As a background Databricks imposes similar limitation where
>>>> as you
>>>> >> > >>>>> cannot run normal Spark code and DLT on the same cluster for
>>>> some reason
>>>> >> > >>>>> and forces to use two clusters increasing the cost and
>>>> latency.
>>>> >> > >>>>>
>>>> >> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]>
>>>> wrote:
>>>> >> > >>>>>
>>>> >> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've
>>>> been
>>>> >> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie
>>>> Yang: [JIRA
>>>> >> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>>> >> > >>>>>> <
>>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>>> >
>>>> >> > >>>>>> ].
>>>> >> > >>>>>>
>>>> >> > >>>>>> The SPIP proposes extending Spark's lazy, declarative
>>>> execution model
>>>> >> > >>>>>> beyond single queries, to pipelines that keep multiple
>>>> datasets up to date.
>>>> >> > >>>>>> It introduces the ability to compose multiple
>>>> transformations into a single
>>>> >> > >>>>>> declarative dataflow graph.
>>>> >> > >>>>>>
>>>> >> > >>>>>> Declarative pipelines aim to simplify the development and
>>>> management
>>>> >> > >>>>>> of data pipelines, by  removing the need for manual
>>>> orchestration of
>>>> >> > >>>>>> dependencies and making it possible to catch many errors
>>>> before any
>>>> >> > >>>>>> execution steps are launched.
>>>> >> > >>>>>>
>>>> >> > >>>>>> Declarative pipelines can include both batch and streaming
>>>> >> > >>>>>> computations, leveraging Structured Streaming for stream
>>>> processing and new
>>>> >> > >>>>>> materialized view syntax for batch processing. Tight
>>>> integration with Spark
>>>> >> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
>>>> detection than is
>>>> >> > >>>>>> achievable with more generic frameworks.
>>>> >> > >>>>>>
>>>> >> > >>>>>> Let us know what you think!
>>>> >> > >>>>>>
>>>> >> > >>>>>>
>>>> >> >
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe e-mail: [email protected]
>>>> >>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: [email protected]
>>>>
>>>>
>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to