+1, I think that it's better to make this change for the major Spark release.
On Fri, Nov 29, 2024 at 12:25 AM Martin Grund <mar...@databricks.com.invalid> wrote: > At the chance of repeating what Herman said word by word :) I would like > to call out the following: > > 1. The goal of setting the default is to guide users to use the Spark > SQL APIs that have proven over time. We shouldn't underestimate the power > of the default. I would assume that we all agree that 99% of the _new_ > users in Spark should not try to write code in RDDs. > > 2. Any user, organization, or vendor can leverage *all* of their > existing code by simply changing *one* configuration during startup: > switching the spark.api.mode to classic (e.g., similar to ANSI mode). This > means all existing RDD and library code just works fine. > > 3. Creating a fractured user experience by using some logic to > identify which API mode is used is not ideal. For many of the use cases > that I've seen that require additional jars (e.g., data sources, drivers), > they just work fine because Spark already has the right abstractions. For > JARs used in the client side part of the code they just work as Herman > said. > > Similarly based on the experience of running Spark Connect in production, > the co-existence of workloads running in classic mode and connect mode is > working fine. > > > > On Fri, Nov 29, 2024 at 3:18 AM Holden Karau <holden.ka...@gmail.com> > wrote: > >> I would switch to +0 if the default of connect was only for apps without >> any user provided jars/non-JVM apps. >> >> Twitter: https://twitter.com/holdenkarau >> Fight Health Insurance: https://www.fighthealthinsurance.com/ >> <https://www.fighthealthinsurance.com/?q=hk_email> >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> Pronouns: she/her >> >> >> On Thu, Nov 28, 2024 at 6:11 PM Holden Karau <holden.ka...@gmail.com> >> wrote: >> >>> Given there is no plan to support RDDs I’ll update to -0.9 >>> >>> >>> Twitter: https://twitter.com/holdenkarau >>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>> <https://www.fighthealthinsurance.com/?q=hk_email> >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> Pronouns: she/her >>> >>> >>> On Thu, Nov 28, 2024 at 6:00 PM Herman van Hovell <her...@databricks.com> >>> wrote: >>> >>>> Hi Holden and Mridul, >>>> >>>> Just to be clear. What API parity are you expecting here? We have >>>> parity for everything that is exposed in org.apache.spark.sql. Connect >>>> does not support RDDs, SparkContext, etc... There are currently no >>>> plans to support this. We are considering adding a compatibility layer but >>>> that will be limited in scope. From running Connect in production for the >>>> last year, we see that most users can migrate their workloads without any >>>> problems. >>>> >>>> I do want to call out that this proposal is mostly aimed at how new >>>> users will interact with Spark. Existing users, when they migrate their >>>> application to Spark 4, have to set a conf when it turns out their >>>> application is not working. This should be a minor inconvenience compared >>>> to the headaches that a new Scala version or other library upgrades can >>>> cause. >>>> >>>> Since this is a breaking change, I do think this should be done in a >>>> major version. >>>> >>>> With the risk of repeating the SPIP, using Connect as the default >>>> brings a lot to the table (e.g. simplicity, easier upgrades, extensibility, >>>> etc...), I'd urge you to also factor this into your decision making. >>>> >>>> Happy thanksgiving! >>>> >>>> Cheers, >>>> Herman >>>> >>>> On Thu, Nov 28, 2024 at 8:43 PM Mridul Muralidharan <mri...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I agree with Holden, I am leaning -1 on the proposal as well. >>>>> Unlike removal of deprecated features, which we align on a major >>>>> version boundary, changing the default is something we can do in a minor >>>>> version as well - once there is api parity. >>>>> >>>>> Irrespective of which major/minor version we make the switch in - >>>>> there could be user impact; minimizing this impact would be greatly >>>>> appreciated by our users. >>>>> >>>>> Regards, >>>>> Mridul >>>>> >>>>> >>>>> >>>>> On Wed, Nov 27, 2024 at 8:31 PM Holden Karau <holden.ka...@gmail.com> >>>>> wrote: >>>>> >>>>>> -0.5: I don’t think this a good idea for JVM apps until we have API >>>>>> parity. (Binding but to be clear not a veto) >>>>>> >>>>>> Twitter: https://twitter.com/holdenkarau >>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>> Pronouns: she/her >>>>>> >>>>>> >>>>>> On Wed, Nov 27, 2024 at 6:27 PM Xinrong Meng <xinr...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> Thank you Herman! >>>>>>> >>>>>>> On Thu, Nov 28, 2024 at 3:37 AM Dongjoon Hyun < >>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> On Wed, Nov 27, 2024 at 09:16 Denny Lee <denny.g....@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> +1 (non-binding) >>>>>>>>> >>>>>>>>> On Wed, Nov 27, 2024 at 3:07 AM Martin Grund >>>>>>>>> <mar...@databricks.com.invalid> wrote: >>>>>>>>> >>>>>>>>>> As part of the discussion on this topic, I would love to >>>>>>>>>> highlight the work that the community is currently doing to support >>>>>>>>>> SparkML, which is traditionally very RDD-heavy, natively in Spark >>>>>>>>>> Connect. >>>>>>>>>> Bobby's awesome work shows that, over time, we can extend the >>>>>>>>>> features of >>>>>>>>>> Spark Connect and support workloads that we previously thought could >>>>>>>>>> not be >>>>>>>>>> supported easily. >>>>>>>>>> >>>>>>>>>> https://github.com/apache/spark/pull/48791 >>>>>>>>>> >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>>> On Wed, Nov 27, 2024 at 11:42 AM Yang,Jie(INF) >>>>>>>>>> <yangji...@baidu.com.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> +1 >>>>>>>>>>> -------- 原始邮件 -------- >>>>>>>>>>> 发件人:Hyukjin Kwon<gurwls...@apache.org> >>>>>>>>>>> 时间:2024-11-27 08:04:06 >>>>>>>>>>> 主题:[外部邮件] Re: Spark Connect the default API in Spark 4.0 >>>>>>>>>>> 收件人:Bjørn Jørgensen<bjornjorgen...@gmail.com>; >>>>>>>>>>> 抄送人:Herman van Hovell<her...@databricks.com.invalid>;Spark dev >>>>>>>>>>> list<dev@spark.apache.org>; >>>>>>>>>>> +1 >>>>>>>>>>> >>>>>>>>>>> On Mon, 25 Nov 2024 at 23:33, Bjørn Jørgensen < >>>>>>>>>>> bjornjorgen...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> +1 >>>>>>>>>>>> >>>>>>>>>>>> man. 25. nov. 2024 kl. 14:48 skrev Herman van Hovell >>>>>>>>>>>> <her...@databricks.com.invalid>: >>>>>>>>>>>> >>>>>>>>>>>>> Hi All, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to start a discussion on "Spark Connect the >>>>>>>>>>>>> default API in Spark 4.0". >>>>>>>>>>>>> >>>>>>>>>>>>> The rationale for this change is that Spark Connect brings a >>>>>>>>>>>>> lot of improvements with respect to simplicity, stability, >>>>>>>>>>>>> isolation, >>>>>>>>>>>>> upgradability, and extensibility (all detailed in the SPIP). In a >>>>>>>>>>>>> nutshell: >>>>>>>>>>>>> we want to introduce a flag, spark.api.mode, that allows a >>>>>>>>>>>>> user to choose between classic or connect mode, the default >>>>>>>>>>>>> being connect. A user can easily fallback to Classic by >>>>>>>>>>>>> setting spark.api.mode to classic. >>>>>>>>>>>>> >>>>>>>>>>>>> SPIP: >>>>>>>>>>>>> https://docs.google.com/document/d/1C0kuQEliG78HujVwdnSk0wjNwHEDdwo2o8aVq7kbhTo/edit?tab=t.0#heading=h.r2c3xrbiklu3 >>>>>>>>>>>>> <https://mailshield.baidu.com/check?q=5uIK5BsJhkKEitTyTno8Yb7Zq%2boLHvRsgSoBr5oTNJEHXWS9Np0U8pCuv2DeJDfCQJiI52FAoCrxDEqnj1jOqX9A3jtJcetvkKkKE696xfrLfKuuRuyCC9YrwN5IW4OUtkhdHz7C%2bER2GN9EPqnlIlX2osm36Zbn> >>>>>>>>>>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-50411 >>>>>>>>>>>>> <https://mailshield.baidu.com/check?q=vc5arXeK3OKfjk5Oxe1F%2fMNjR%2fSx5pTdbaOArWe9m2MpZDOF702CYYagPMQmbDqV7xnWwxsUdOc%3d> >>>>>>>>>>>>> >>>>>>>>>>>>> I am looking forward to your feedback! >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Herman >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Bjørn Jørgensen >>>>>>>>>>>> Vestre Aspehaug 4, 6010 Ålesund >>>>>>>>>>>> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++%0D%0ANorge?entry=gmail&source=g> >>>>>>>>>>>> Norge >>>>>>>>>>>> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++%0D%0ANorge?entry=gmail&source=g> >>>>>>>>>>>> >>>>>>>>>>>> +47 480 94 297 >>>>>>>>>>>> >>>>>>>>>>>
smime.p7s
Description: S/MIME Cryptographic Signature