Hi all – starting a discussion thread for a SPIP that I've been working on
with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA
<https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
<https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0>
].

The SPIP proposes extending Spark's lazy, declarative execution model
beyond single queries, to pipelines that keep multiple datasets up to date.
It introduces the ability to compose multiple transformations into a single
declarative dataflow graph.

Declarative pipelines aim to simplify the development and management of
data pipelines, by  removing the need for manual orchestration of
dependencies and making it possible to catch many errors before any
execution steps are launched.

Declarative pipelines can include both batch and streaming computations,
leveraging Structured Streaming for stream processing and new materialized
view syntax for batch processing. Tight integration with Spark SQL's
analyzer enables deeper analysis and earlier error detection than is
achievable with more generic frameworks.

Let us know what you think!

Reply via email to