Re: [DISCUSS] SPIP: Declarative Pipelines

Mich Talebzadeh Wed, 09 Apr 2025 00:37:56 -0700

+1

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Wed, 9 Apr 2025 at 08:07, Peter Toth <[email protected]> wrote:

> +1
>
> On Wed, Apr 9, 2025 at 8:51 AM Cheng Pan <[email protected]> wrote:
>
>> +1 (non-binding)
>>
>> Glad to see Spark SQL extended to streaming use cases.
>>
>> Thanks,
>> Cheng Pan
>>
>>
>>
>> On Apr 9, 2025, at 14:43, Anton Okolnychyi <[email protected]> wrote:
>>
>> +1
>>
>> вт, 8 квіт. 2025 р. о 23:36 Jacky Lee <[email protected]> пише:
>>
>>> +1 I'm delighted that it will be open-sourced, enabling greater
>>> integration with Iceberg/Delta to unlock more value.
>>>
>>> Jungtaek Lim <[email protected]> 于2025年4月9日周三 10:47写道：
>>> >
>>> > +1 looking forward to seeing this make progress!
>>> >
>>> > On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]> wrote:
>>> >>
>>> >> +1
>>> >>
>>> >> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>> >> > +1
>>> >> >
>>> >> > I am actually pretty excited to have this. Happy to see this being
>>> proposed.
>>> >> >
>>> >> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
>>> >> >
>>> >> > > +1. Super excited about this effort!
>>> >> > >
>>> >> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <[email protected]>
>>> wrote:
>>> >> > >
>>> >> > >> +1 I support this SPIP because it simplifies data pipeline
>>> management and
>>> >> > >> enhances error detection.
>>> >> > >>
>>> >> > >>
>>> >> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]>
>>> wrote:
>>> >> > >>
>>> >> > >>> Excited to see this heading toward open source — materialized
>>> views and
>>> >> > >>> other features will bring a lot of value.
>>> >> > >>> +1 (non-binding)
>>> >> > >>>
>>> >> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]>
>>> wrote:
>>> >> > >>>
>>> >> > >>>> Hi Khalid – the CLI in the current proposal will need to be
>>> built on
>>> >> > >>>> top of internal APIs for constructing and launching pipeline
>>> executions.
>>> >> > >>>> We'll have the option to expose these in the future.
>>> >> > >>>>
>>> >> > >>>> It would be worthwhile to understand the use cases in more
>>> depth before
>>> >> > >>>> exposing these, because APIs are one-way doors and can be
>>> costly to
>>> >> > >>>> maintain.
>>> >> > >>>>
>>> >> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>> >> > >>>> [email protected]> wrote:
>>> >> > >>>>
>>> >> > >>>>> Looks great!
>>> >> > >>>>> QQ: will user able to run this pipeline from normal code?
>>> I.e. can I
>>> >> > >>>>> trigger a pipeline from *driver* code based on some condition
>>> etc. or
>>> >> > >>>>> it must be executed via separate shell command ?
>>> >> > >>>>> As a background Databricks imposes similar limitation where
>>> as you
>>> >> > >>>>> cannot run normal Spark code and DLT on the same cluster for
>>> some reason
>>> >> > >>>>> and forces to use two clusters increasing the cost and
>>> latency.
>>> >> > >>>>>
>>> >> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]>
>>> wrote:
>>> >> > >>>>>
>>> >> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've
>>> been
>>> >> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie
>>> Yang: [JIRA
>>> >> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>> >> > >>>>>> <
>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>> >
>>> >> > >>>>>> ].
>>> >> > >>>>>>
>>> >> > >>>>>> The SPIP proposes extending Spark's lazy, declarative
>>> execution model
>>> >> > >>>>>> beyond single queries, to pipelines that keep multiple
>>> datasets up to date.
>>> >> > >>>>>> It introduces the ability to compose multiple
>>> transformations into a single
>>> >> > >>>>>> declarative dataflow graph.
>>> >> > >>>>>>
>>> >> > >>>>>> Declarative pipelines aim to simplify the development and
>>> management
>>> >> > >>>>>> of data pipelines, by  removing the need for manual
>>> orchestration of
>>> >> > >>>>>> dependencies and making it possible to catch many errors
>>> before any
>>> >> > >>>>>> execution steps are launched.
>>> >> > >>>>>>
>>> >> > >>>>>> Declarative pipelines can include both batch and streaming
>>> >> > >>>>>> computations, leveraging Structured Streaming for stream
>>> processing and new
>>> >> > >>>>>> materialized view syntax for batch processing. Tight
>>> integration with Spark
>>> >> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
>>> detection than is
>>> >> > >>>>>> achievable with more generic frameworks.
>>> >> > >>>>>>
>>> >> > >>>>>> Let us know what you think!
>>> >> > >>>>>>
>>> >> > >>>>>>
>>> >> >
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe e-mail: [email protected]
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>>
>>>
>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to