Hi all – starting a discussion thread for a SPIP that I've been working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0> ].
The SPIP proposes extending Spark's lazy, declarative execution model beyond single queries, to pipelines that keep multiple datasets up to date. It introduces the ability to compose multiple transformations into a single declarative dataflow graph. Declarative pipelines aim to simplify the development and management of data pipelines, by removing the need for manual orchestration of dependencies and making it possible to catch many errors before any execution steps are launched. Declarative pipelines can include both batch and streaming computations, leveraging Structured Streaming for stream processing and new materialized view syntax for batch processing. Tight integration with Spark SQL's analyzer enables deeper analysis and earlier error detection than is achievable with more generic frameworks. Let us know what you think!