Re: [DISCUSS] SPIP: Declarative Pipelines

Kent Yao Wed, 09 Apr 2025 03:43:48 -0700

+1

Kent Yao


Sem <[email protected]> 于2025年4月9日周三 14:08写道：

> +1 (non-binding)
>
>
> On April 9, 2025 7:29:40 AM GMT+02:00, Rishab Joshi <[email protected]>
> wrote:
>
>> +1 Exciting.
>> Rishab Joshi
>>
>> On Tue, Apr 8, 2025, 10:04 PM Ruifeng Zheng <[email protected]> wrote:
>>
>>> +1
>>>
>>> On Wed, Apr 9, 2025 at 12:57 PM Denny Lee <[email protected]> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>> On Tue, Apr 8, 2025 at 9:53 PM Yuming Wang <[email protected]> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On Wed, Apr 9, 2025 at 10:47 AM Jungtaek Lim <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> +1 looking forward to seeing this make progress!
>>>>>>
>>>>>> On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>>>>>> > +1
>>>>>>> >
>>>>>>> > I am actually pretty excited to have this. Happy to see this being
>>>>>>> proposed.
>>>>>>> >
>>>>>>> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
>>>>>>> >
>>>>>>> > > +1. Super excited about this effort!
>>>>>>> > >
>>>>>>> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <
>>>>>>> [email protected]> wrote:
>>>>>>> > >
>>>>>>> > >> +1 I support this SPIP because it simplifies data pipeline
>>>>>>> management and
>>>>>>> > >> enhances error detection.
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]>
>>>>>>> wrote:
>>>>>>> > >>
>>>>>>> > >>> Excited to see this heading toward open source — materialized
>>>>>>> views and
>>>>>>> > >>> other features will bring a lot of value.
>>>>>>> > >>> +1 (non-binding)
>>>>>>> > >>>
>>>>>>> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]>
>>>>>>> wrote:
>>>>>>> > >>>
>>>>>>> > >>>> Hi Khalid – the CLI in the current proposal will need to be
>>>>>>> built on
>>>>>>> > >>>> top of internal APIs for constructing and launching pipeline
>>>>>>> executions.
>>>>>>> > >>>> We'll have the option to expose these in the future.
>>>>>>> > >>>>
>>>>>>> > >>>> It would be worthwhile to understand the use cases in more
>>>>>>> depth before
>>>>>>> > >>>> exposing these, because APIs are one-way doors and can be
>>>>>>> costly to
>>>>>>> > >>>> maintain.
>>>>>>> > >>>>
>>>>>>> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>>>>>> > >>>> [email protected]> wrote:
>>>>>>> > >>>>
>>>>>>> > >>>>> Looks great!
>>>>>>> > >>>>> QQ: will user able to run this pipeline from normal code?
>>>>>>> I.e. can I
>>>>>>> > >>>>> trigger a pipeline from *driver* code based on some
>>>>>>> condition etc. or
>>>>>>> > >>>>> it must be executed via separate shell command ?
>>>>>>> > >>>>> As a background Databricks imposes similar limitation where
>>>>>>> as you
>>>>>>> > >>>>> cannot run normal Spark code and DLT on the same cluster for
>>>>>>> some reason
>>>>>>> > >>>>> and forces to use two clusters increasing the cost and
>>>>>>> latency.
>>>>>>> > >>>>>
>>>>>>> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]>
>>>>>>> wrote:
>>>>>>> > >>>>>
>>>>>>> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've
>>>>>>> been
>>>>>>> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie
>>>>>>> Yang: [JIRA
>>>>>>> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>>>>>> > >>>>>> <
>>>>>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>>>>>> >
>>>>>>> > >>>>>> ].
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> The SPIP proposes extending Spark's lazy, declarative
>>>>>>> execution model
>>>>>>> > >>>>>> beyond single queries, to pipelines that keep multiple
>>>>>>> datasets up to date.
>>>>>>> > >>>>>> It introduces the ability to compose multiple
>>>>>>> transformations into a single
>>>>>>> > >>>>>> declarative dataflow graph.
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> Declarative pipelines aim to simplify the development and
>>>>>>> management
>>>>>>> > >>>>>> of data pipelines, by  removing the need for manual
>>>>>>> orchestration of
>>>>>>> > >>>>>> dependencies and making it possible to catch many errors
>>>>>>> before any
>>>>>>> > >>>>>> execution steps are launched.
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> Declarative pipelines can include both batch and streaming
>>>>>>> > >>>>>> computations, leveraging Structured Streaming for stream
>>>>>>> processing and new
>>>>>>> > >>>>>> materialized view syntax for batch processing. Tight
>>>>>>> integration with Spark
>>>>>>> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
>>>>>>> detection than is
>>>>>>> > >>>>>> achievable with more generic frameworks.
>>>>>>> > >>>>>>
>>>>>>> > >>>>>> Let us know what you think!
>>>>>>> > >>>>>>
>>>>>>> > >>>>>>
>>>>>>> >
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>>
>>>>>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to