Re: [DISCUSS] SPIP: Declarative Pipelines

huaxin gao Tue, 08 Apr 2025 09:47:33 -0700

+1 I support this SPIP because it simplifies data pipeline management and
enhances error detection.



On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]> wrote:

> Excited to see this heading toward open source — materialized views and
> other features will bring a lot of value.
> +1 (non-binding)
>
> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]> wrote:
>
>> Hi Khalid – the CLI in the current proposal will need to be built on top
>> of internal APIs for constructing and launching pipeline executions. We'll
>> have the option to expose these in the future.
>>
>> It would be worthwhile to understand the use cases in more depth before
>> exposing these, because APIs are one-way doors and can be costly to
>> maintain.
>>
>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>> [email protected]> wrote:
>>
>>> Looks great!
>>> QQ: will user able to run this pipeline from normal code? I.e. can I
>>> trigger a pipeline from *driver* code based on some condition etc. or
>>> it must be executed via separate shell command ?
>>> As a background Databricks imposes similar limitation where as you
>>> cannot run normal Spark code and DLT on the same cluster for some reason
>>> and forces to use two clusters increasing the cost and latency.
>>>
>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]> wrote:
>>>
>>>> Hi all – starting a discussion thread for a SPIP that I've been working
>>>> on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA
>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>>> <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0>
>>>> ].
>>>>
>>>> The SPIP proposes extending Spark's lazy, declarative execution model
>>>> beyond single queries, to pipelines that keep multiple datasets up to date.
>>>> It introduces the ability to compose multiple transformations into a single
>>>> declarative dataflow graph.
>>>>
>>>> Declarative pipelines aim to simplify the development and management of
>>>> data pipelines, by  removing the need for manual orchestration of
>>>> dependencies and making it possible to catch many errors before any
>>>> execution steps are launched.
>>>>
>>>> Declarative pipelines can include both batch and streaming
>>>> computations, leveraging Structured Streaming for stream processing and new
>>>> materialized view syntax for batch processing. Tight integration with Spark
>>>> SQL's analyzer enables deeper analysis and earlier error detection than is
>>>> achievable with more generic frameworks.
>>>>
>>>> Let us know what you think!
>>>>
>>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to