Re: [DISCUSS] SPIP: Declarative Pipelines

Jacky Lee Tue, 08 Apr 2025 23:44:19 -0700

+1 I'm delighted that it will be open-sourced, enabling greater
integration with Iceberg/Delta to unlock more value.


Jungtaek Lim <[email protected]> 于2025年4月9日周三 10:47写道：
>
> +1 looking forward to seeing this make progress!
>
> On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]> wrote:
>>
>> +1
>>
>> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>> > +1
>> >
>> > I am actually pretty excited to have this. Happy to see this being 
>> > proposed.
>> >
>> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
>> >
>> > > +1. Super excited about this effort!
>> > >
>> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <[email protected]> wrote:
>> > >
>> > >> +1 I support this SPIP because it simplifies data pipeline management 
>> > >> and
>> > >> enhances error detection.
>> > >>
>> > >>
>> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]> wrote:
>> > >>
>> > >>> Excited to see this heading toward open source — materialized views and
>> > >>> other features will bring a lot of value.
>> > >>> +1 (non-binding)
>> > >>>
>> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]> wrote:
>> > >>>
>> > >>>> Hi Khalid – the CLI in the current proposal will need to be built on
>> > >>>> top of internal APIs for constructing and launching pipeline 
>> > >>>> executions.
>> > >>>> We'll have the option to expose these in the future.
>> > >>>>
>> > >>>> It would be worthwhile to understand the use cases in more depth 
>> > >>>> before
>> > >>>> exposing these, because APIs are one-way doors and can be costly to
>> > >>>> maintain.
>> > >>>>
>> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>> > >>>> [email protected]> wrote:
>> > >>>>
>> > >>>>> Looks great!
>> > >>>>> QQ: will user able to run this pipeline from normal code? I.e. can I
>> > >>>>> trigger a pipeline from *driver* code based on some condition etc. or
>> > >>>>> it must be executed via separate shell command ?
>> > >>>>> As a background Databricks imposes similar limitation where as you
>> > >>>>> cannot run normal Spark code and DLT on the same cluster for some 
>> > >>>>> reason
>> > >>>>> and forces to use two clusters increasing the cost and latency.
>> > >>>>>
>> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]> wrote:
>> > >>>>>
>> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've been
>> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA
>> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>> > >>>>>> <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0>
>> > >>>>>> ].
>> > >>>>>>
>> > >>>>>> The SPIP proposes extending Spark's lazy, declarative execution 
>> > >>>>>> model
>> > >>>>>> beyond single queries, to pipelines that keep multiple datasets up 
>> > >>>>>> to date.
>> > >>>>>> It introduces the ability to compose multiple transformations into 
>> > >>>>>> a single
>> > >>>>>> declarative dataflow graph.
>> > >>>>>>
>> > >>>>>> Declarative pipelines aim to simplify the development and management
>> > >>>>>> of data pipelines, by  removing the need for manual orchestration of
>> > >>>>>> dependencies and making it possible to catch many errors before any
>> > >>>>>> execution steps are launched.
>> > >>>>>>
>> > >>>>>> Declarative pipelines can include both batch and streaming
>> > >>>>>> computations, leveraging Structured Streaming for stream processing 
>> > >>>>>> and new
>> > >>>>>> materialized view syntax for batch processing. Tight integration 
>> > >>>>>> with Spark
>> > >>>>>> SQL's analyzer enables deeper analysis and earlier error detection 
>> > >>>>>> than is
>> > >>>>>> achievable with more generic frameworks.
>> > >>>>>>
>> > >>>>>> Let us know what you think!
>> > >>>>>>
>> > >>>>>>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to