Hi Holden and Mridul, Just to be clear. What API parity are you expecting here? We have parity for everything that is exposed in org.apache.spark.sql. Connect does not support RDDs, SparkContext, etc... There are currently no plans to support this. We are considering adding a compatibility layer but that will be limited in scope. From running Connect in production for the last year, we see that most users can migrate their workloads without any problems.
I do want to call out that this proposal is mostly aimed at how new users will interact with Spark. Existing users, when they migrate their application to Spark 4, have to set a conf when it turns out their application is not working. This should be a minor inconvenience compared to the headaches that a new Scala version or other library upgrades can cause. Since this is a breaking change, I do think this should be done in a major version. With the risk of repeating the SPIP, using Connect as the default brings a lot to the table (e.g. simplicity, easier upgrades, extensibility, etc...), I'd urge you to also factor this into your decision making. Happy thanksgiving! Cheers, Herman On Thu, Nov 28, 2024 at 8:43 PM Mridul Muralidharan <mri...@gmail.com> wrote: > Hi, > > I agree with Holden, I am leaning -1 on the proposal as well. > Unlike removal of deprecated features, which we align on a major version > boundary, changing the default is something we can do in a minor version as > well - once there is api parity. > > Irrespective of which major/minor version we make the switch in - there > could be user impact; minimizing this impact would be greatly appreciated > by our users. > > Regards, > Mridul > > > > On Wed, Nov 27, 2024 at 8:31 PM Holden Karau <holden.ka...@gmail.com> > wrote: > >> -0.5: I don’t think this a good idea for JVM apps until we have API >> parity. (Binding but to be clear not a veto) >> >> Twitter: https://twitter.com/holdenkarau >> Fight Health Insurance: https://www.fighthealthinsurance.com/ >> <https://www.fighthealthinsurance.com/?q=hk_email> >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> Pronouns: she/her >> >> >> On Wed, Nov 27, 2024 at 6:27 PM Xinrong Meng <xinr...@apache.org> wrote: >> >>> +1 >>> >>> Thank you Herman! >>> >>> On Thu, Nov 28, 2024 at 3:37 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> >>>> +1 >>>> >>>> On Wed, Nov 27, 2024 at 09:16 Denny Lee <denny.g....@gmail.com> wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> On Wed, Nov 27, 2024 at 3:07 AM Martin Grund >>>>> <mar...@databricks.com.invalid> wrote: >>>>> >>>>>> As part of the discussion on this topic, I would love to highlight >>>>>> the work that the community is currently doing to support SparkML, which >>>>>> is >>>>>> traditionally very RDD-heavy, natively in Spark Connect. Bobby's awesome >>>>>> work shows that, over time, we can extend the features of Spark Connect >>>>>> and >>>>>> support workloads that we previously thought could not be supported >>>>>> easily. >>>>>> >>>>>> https://github.com/apache/spark/pull/48791 >>>>>> >>>>>> Martin >>>>>> >>>>>> On Wed, Nov 27, 2024 at 11:42 AM Yang,Jie(INF) >>>>>> <yangji...@baidu.com.invalid> wrote: >>>>>> >>>>>>> +1 >>>>>>> -------- 原始邮件 -------- >>>>>>> 发件人:Hyukjin Kwon<gurwls...@apache.org> >>>>>>> 时间:2024-11-27 08:04:06 >>>>>>> 主题:[外部邮件] Re: Spark Connect the default API in Spark 4.0 >>>>>>> 收件人:Bjørn Jørgensen<bjornjorgen...@gmail.com>; >>>>>>> 抄送人:Herman van Hovell<her...@databricks.com.invalid>;Spark dev list< >>>>>>> dev@spark.apache.org>; >>>>>>> +1 >>>>>>> >>>>>>> On Mon, 25 Nov 2024 at 23:33, Bjørn Jørgensen < >>>>>>> bjornjorgen...@gmail.com> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> man. 25. nov. 2024 kl. 14:48 skrev Herman van Hovell >>>>>>>> <her...@databricks.com.invalid>: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I would like to start a discussion on "Spark Connect the default >>>>>>>>> API in Spark 4.0". >>>>>>>>> >>>>>>>>> The rationale for this change is that Spark Connect brings a lot >>>>>>>>> of improvements with respect to simplicity, stability, isolation, >>>>>>>>> upgradability, and extensibility (all detailed in the SPIP). In a >>>>>>>>> nutshell: >>>>>>>>> we want to introduce a flag, spark.api.mode, that allows a user >>>>>>>>> to choose between classic or connect mode, the default being >>>>>>>>> connect. A user can easily fallback to Classic by setting >>>>>>>>> spark.api.mode to classic. >>>>>>>>> >>>>>>>>> SPIP: >>>>>>>>> https://docs.google.com/document/d/1C0kuQEliG78HujVwdnSk0wjNwHEDdwo2o8aVq7kbhTo/edit?tab=t.0#heading=h.r2c3xrbiklu3 >>>>>>>>> <https://mailshield.baidu.com/check?q=5uIK5BsJhkKEitTyTno8Yb7Zq%2boLHvRsgSoBr5oTNJEHXWS9Np0U8pCuv2DeJDfCQJiI52FAoCrxDEqnj1jOqX9A3jtJcetvkKkKE696xfrLfKuuRuyCC9YrwN5IW4OUtkhdHz7C%2bER2GN9EPqnlIlX2osm36Zbn> >>>>>>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-50411 >>>>>>>>> <https://mailshield.baidu.com/check?q=vc5arXeK3OKfjk5Oxe1F%2fMNjR%2fSx5pTdbaOArWe9m2MpZDOF702CYYagPMQmbDqV7xnWwxsUdOc%3d> >>>>>>>>> >>>>>>>>> I am looking forward to your feedback! >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Herman >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Bjørn Jørgensen >>>>>>>> Vestre Aspehaug 4, 6010 Ålesund >>>>>>>> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++%0D%0ANorge?entry=gmail&source=g> >>>>>>>> Norge >>>>>>>> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++%0D%0ANorge?entry=gmail&source=g> >>>>>>>> >>>>>>>> +47 480 94 297 >>>>>>>> >>>>>>>