Re: [DISCUSS] SPIP: Declarative Pipelines

Ruifeng Zheng Tue, 08 Apr 2025 22:04:38 -0700

+1

On Wed, Apr 9, 2025 at 12:57 PM Denny Lee <[email protected]> wrote:


> +1 (non-binding)
>
> On Tue, Apr 8, 2025 at 9:53 PM Yuming Wang <[email protected]> wrote:
>
>> +1
>>
>> On Wed, Apr 9, 2025 at 10:47 AM Jungtaek Lim <
>> [email protected]> wrote:
>>
>>> +1 looking forward to seeing this make progress!
>>>
>>> On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]> wrote:
>>>
>>>> +1
>>>>
>>>> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>>> > +1
>>>> >
>>>> > I am actually pretty excited to have this. Happy to see this being
>>>> proposed.
>>>> >
>>>> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
>>>> >
>>>> > > +1. Super excited about this effort!
>>>> > >
>>>> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <[email protected]>
>>>> wrote:
>>>> > >
>>>> > >> +1 I support this SPIP because it simplifies data pipeline
>>>> management and
>>>> > >> enhances error detection.
>>>> > >>
>>>> > >>
>>>> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]>
>>>> wrote:
>>>> > >>
>>>> > >>> Excited to see this heading toward open source — materialized
>>>> views and
>>>> > >>> other features will bring a lot of value.
>>>> > >>> +1 (non-binding)
>>>> > >>>
>>>> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]>
>>>> wrote:
>>>> > >>>
>>>> > >>>> Hi Khalid – the CLI in the current proposal will need to be
>>>> built on
>>>> > >>>> top of internal APIs for constructing and launching pipeline
>>>> executions.
>>>> > >>>> We'll have the option to expose these in the future.
>>>> > >>>>
>>>> > >>>> It would be worthwhile to understand the use cases in more depth
>>>> before
>>>> > >>>> exposing these, because APIs are one-way doors and can be costly
>>>> to
>>>> > >>>> maintain.
>>>> > >>>>
>>>> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>>> > >>>> [email protected]> wrote:
>>>> > >>>>
>>>> > >>>>> Looks great!
>>>> > >>>>> QQ: will user able to run this pipeline from normal code? I.e.
>>>> can I
>>>> > >>>>> trigger a pipeline from *driver* code based on some condition
>>>> etc. or
>>>> > >>>>> it must be executed via separate shell command ?
>>>> > >>>>> As a background Databricks imposes similar limitation where as
>>>> you
>>>> > >>>>> cannot run normal Spark code and DLT on the same cluster for
>>>> some reason
>>>> > >>>>> and forces to use two clusters increasing the cost and latency.
>>>> > >>>>>
>>>> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]>
>>>> wrote:
>>>> > >>>>>
>>>> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've been
>>>> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang:
>>>> [JIRA
>>>> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>>> > >>>>>> <
>>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>>> >
>>>> > >>>>>> ].
>>>> > >>>>>>
>>>> > >>>>>> The SPIP proposes extending Spark's lazy, declarative
>>>> execution model
>>>> > >>>>>> beyond single queries, to pipelines that keep multiple
>>>> datasets up to date.
>>>> > >>>>>> It introduces the ability to compose multiple transformations
>>>> into a single
>>>> > >>>>>> declarative dataflow graph.
>>>> > >>>>>>
>>>> > >>>>>> Declarative pipelines aim to simplify the development and
>>>> management
>>>> > >>>>>> of data pipelines, by  removing the need for manual
>>>> orchestration of
>>>> > >>>>>> dependencies and making it possible to catch many errors
>>>> before any
>>>> > >>>>>> execution steps are launched.
>>>> > >>>>>>
>>>> > >>>>>> Declarative pipelines can include both batch and streaming
>>>> > >>>>>> computations, leveraging Structured Streaming for stream
>>>> processing and new
>>>> > >>>>>> materialized view syntax for batch processing. Tight
>>>> integration with Spark
>>>> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
>>>> detection than is
>>>> > >>>>>> achievable with more generic frameworks.
>>>> > >>>>>>
>>>> > >>>>>> Let us know what you think!
>>>> > >>>>>>
>>>> > >>>>>>
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: [email protected]
>>>>
>>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to