Re: [DISCUSS] SPIP: Declarative Pipelines

Denny Lee Thu, 10 Apr 2025 13:30:41 -0700

+1 (non-binding)

On Tue, Apr 8, 2025 at 9:53 PM Yuming Wang <[email protected]> wrote:


> +1
>
> On Wed, Apr 9, 2025 at 10:47 AM Jungtaek Lim <[email protected]>
> wrote:
>
>> +1 looking forward to seeing this make progress!
>>
>> On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]> wrote:
>>
>>> +1
>>>
>>> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>> > +1
>>> >
>>> > I am actually pretty excited to have this. Happy to see this being
>>> proposed.
>>> >
>>> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
>>> >
>>> > > +1. Super excited about this effort!
>>> > >
>>> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <[email protected]>
>>> wrote:
>>> > >
>>> > >> +1 I support this SPIP because it simplifies data pipeline
>>> management and
>>> > >> enhances error detection.
>>> > >>
>>> > >>
>>> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]>
>>> wrote:
>>> > >>
>>> > >>> Excited to see this heading toward open source — materialized
>>> views and
>>> > >>> other features will bring a lot of value.
>>> > >>> +1 (non-binding)
>>> > >>>
>>> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]>
>>> wrote:
>>> > >>>
>>> > >>>> Hi Khalid – the CLI in the current proposal will need to be built
>>> on
>>> > >>>> top of internal APIs for constructing and launching pipeline
>>> executions.
>>> > >>>> We'll have the option to expose these in the future.
>>> > >>>>
>>> > >>>> It would be worthwhile to understand the use cases in more depth
>>> before
>>> > >>>> exposing these, because APIs are one-way doors and can be costly
>>> to
>>> > >>>> maintain.
>>> > >>>>
>>> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>> > >>>> [email protected]> wrote:
>>> > >>>>
>>> > >>>>> Looks great!
>>> > >>>>> QQ: will user able to run this pipeline from normal code? I.e.
>>> can I
>>> > >>>>> trigger a pipeline from *driver* code based on some condition
>>> etc. or
>>> > >>>>> it must be executed via separate shell command ?
>>> > >>>>> As a background Databricks imposes similar limitation where as
>>> you
>>> > >>>>> cannot run normal Spark code and DLT on the same cluster for
>>> some reason
>>> > >>>>> and forces to use two clusters increasing the cost and latency.
>>> > >>>>>
>>> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]>
>>> wrote:
>>> > >>>>>
>>> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've been
>>> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang:
>>> [JIRA
>>> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>> > >>>>>> <
>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>> >
>>> > >>>>>> ].
>>> > >>>>>>
>>> > >>>>>> The SPIP proposes extending Spark's lazy, declarative execution
>>> model
>>> > >>>>>> beyond single queries, to pipelines that keep multiple datasets
>>> up to date.
>>> > >>>>>> It introduces the ability to compose multiple transformations
>>> into a single
>>> > >>>>>> declarative dataflow graph.
>>> > >>>>>>
>>> > >>>>>> Declarative pipelines aim to simplify the development and
>>> management
>>> > >>>>>> of data pipelines, by  removing the need for manual
>>> orchestration of
>>> > >>>>>> dependencies and making it possible to catch many errors before
>>> any
>>> > >>>>>> execution steps are launched.
>>> > >>>>>>
>>> > >>>>>> Declarative pipelines can include both batch and streaming
>>> > >>>>>> computations, leveraging Structured Streaming for stream
>>> processing and new
>>> > >>>>>> materialized view syntax for batch processing. Tight
>>> integration with Spark
>>> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
>>> detection than is
>>> > >>>>>> achievable with more generic frameworks.
>>> > >>>>>>
>>> > >>>>>> Let us know what you think!
>>> > >>>>>>
>>> > >>>>>>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>>
>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to