+1 (non-binding) On Tue, Apr 8, 2025 at 9:53 PM Yuming Wang <yumw...@apache.org> wrote:
> +1 > > On Wed, Apr 9, 2025 at 10:47 AM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> +1 looking forward to seeing this make progress! >> >> On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <yangji...@apache.org> wrote: >> >>> +1 >>> >>> On 2025/04/09 01:07:57 Hyukjin Kwon wrote: >>> > +1 >>> > >>> > I am actually pretty excited to have this. Happy to see this being >>> proposed. >>> > >>> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <sunc...@apache.org> wrote: >>> > >>> > > +1. Super excited about this effort! >>> > > >>> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <huaxin.ga...@gmail.com> >>> wrote: >>> > > >>> > >> +1 I support this SPIP because it simplifies data pipeline >>> management and >>> > >> enhances error detection. >>> > >> >>> > >> >>> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <dkbis...@gmail.com> >>> wrote: >>> > >> >>> > >>> Excited to see this heading toward open source — materialized >>> views and >>> > >>> other features will bring a lot of value. >>> > >>> +1 (non-binding) >>> > >>> >>> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <sa...@apache.org> >>> wrote: >>> > >>> >>> > >>>> Hi Khalid – the CLI in the current proposal will need to be built >>> on >>> > >>>> top of internal APIs for constructing and launching pipeline >>> executions. >>> > >>>> We'll have the option to expose these in the future. >>> > >>>> >>> > >>>> It would be worthwhile to understand the use cases in more depth >>> before >>> > >>>> exposing these, because APIs are one-way doors and can be costly >>> to >>> > >>>> maintain. >>> > >>>> >>> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov < >>> > >>>> khalidmammad...@gmail.com> wrote: >>> > >>>> >>> > >>>>> Looks great! >>> > >>>>> QQ: will user able to run this pipeline from normal code? I.e. >>> can I >>> > >>>>> trigger a pipeline from *driver* code based on some condition >>> etc. or >>> > >>>>> it must be executed via separate shell command ? >>> > >>>>> As a background Databricks imposes similar limitation where as >>> you >>> > >>>>> cannot run normal Spark code and DLT on the same cluster for >>> some reason >>> > >>>>> and forces to use two clusters increasing the cost and latency. >>> > >>>>> >>> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <sa...@apache.org> >>> wrote: >>> > >>>>> >>> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've been >>> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: >>> [JIRA >>> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc >>> > >>>>>> < >>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0 >>> > >>> > >>>>>> ]. >>> > >>>>>> >>> > >>>>>> The SPIP proposes extending Spark's lazy, declarative execution >>> model >>> > >>>>>> beyond single queries, to pipelines that keep multiple datasets >>> up to date. >>> > >>>>>> It introduces the ability to compose multiple transformations >>> into a single >>> > >>>>>> declarative dataflow graph. >>> > >>>>>> >>> > >>>>>> Declarative pipelines aim to simplify the development and >>> management >>> > >>>>>> of data pipelines, by removing the need for manual >>> orchestration of >>> > >>>>>> dependencies and making it possible to catch many errors before >>> any >>> > >>>>>> execution steps are launched. >>> > >>>>>> >>> > >>>>>> Declarative pipelines can include both batch and streaming >>> > >>>>>> computations, leveraging Structured Streaming for stream >>> processing and new >>> > >>>>>> materialized view syntax for batch processing. Tight >>> integration with Spark >>> > >>>>>> SQL's analyzer enables deeper analysis and earlier error >>> detection than is >>> > >>>>>> achievable with more generic frameworks. >>> > >>>>>> >>> > >>>>>> Let us know what you think! >>> > >>>>>> >>> > >>>>>> >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>>