Re: [外部邮件] Re: Spark Connect the default API in Spark 4.0

Igor Dvorzhak Tue, 03 Dec 2024 18:21:55 -0800

+1, I think that it's better to make this change for the major Spark
release.



On Fri, Nov 29, 2024 at 12:25 AM Martin Grund <mar...@databricks.com.invalid>
wrote:

> At the chance of repeating what Herman said word by word :) I would like
> to call out the following:
>
>    1. The goal of setting the default is to guide users to use the Spark
>    SQL APIs that have proven over time. We shouldn't underestimate the power
>    of the default. I would assume that we all agree that 99% of the _new_
>    users in Spark should not try to write code in RDDs.
>
>    2. Any user, organization, or vendor can leverage *all* of their
>    existing code by simply changing *one* configuration during startup:
>    switching the spark.api.mode to classic (e.g., similar to ANSI mode). This
>    means all existing RDD and library code just works fine.
>
>    3. Creating a fractured user experience by using some logic to
>    identify which API mode is used is not ideal. For many of the use cases
>    that I've seen that require additional jars (e.g., data sources, drivers),
>    they just work fine because Spark already has the right abstractions. For
>    JARs used in the client side part of the code they just work as Herman
>    said.
>
> Similarly based on the experience of running Spark Connect in production,
> the co-existence of workloads running in classic mode and connect mode is
> working fine.
>
>
>
> On Fri, Nov 29, 2024 at 3:18 AM Holden Karau <holden.ka...@gmail.com>
> wrote:
>
>> I would switch to +0 if the default of connect was only for apps without
>> any user provided jars/non-JVM apps.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://www.fighthealthinsurance.com/?q=hk_email>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Thu, Nov 28, 2024 at 6:11 PM Holden Karau <holden.ka...@gmail.com>
>> wrote:
>>
>>> Given there is no plan to support RDDs I’ll update to -0.9
>>>
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> Pronouns: she/her
>>>
>>>
>>> On Thu, Nov 28, 2024 at 6:00 PM Herman van Hovell <her...@databricks.com>
>>> wrote:
>>>
>>>> Hi Holden and Mridul,
>>>>
>>>> Just to be clear. What API parity are you expecting here? We have
>>>> parity for everything that is exposed in org.apache.spark.sql. Connect
>>>> does not support RDDs, SparkContext, etc... There are currently no
>>>> plans to support this. We are considering adding a compatibility layer but
>>>> that will be limited in scope. From running Connect in production for the
>>>> last year, we see that most users can migrate their workloads without any
>>>> problems.
>>>>
>>>> I do want to call out that this proposal is mostly aimed at how new
>>>> users will interact with Spark. Existing users, when they migrate their
>>>> application to Spark 4, have to set a conf when it turns out their
>>>> application is not working. This should be a minor inconvenience compared
>>>> to the headaches that a new Scala version or other library upgrades can
>>>> cause.
>>>>
>>>> Since this is a breaking change, I do think this should be done in a
>>>> major version.
>>>>
>>>> With the risk of repeating the SPIP, using Connect as the default
>>>> brings a lot to the table (e.g. simplicity, easier upgrades, extensibility,
>>>> etc...), I'd urge you to also factor this into your decision making.
>>>>
>>>> Happy thanksgiving!
>>>>
>>>> Cheers,
>>>> Herman
>>>>
>>>> On Thu, Nov 28, 2024 at 8:43 PM Mridul Muralidharan <mri...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>   I agree with Holden, I am leaning -1 on the proposal as well.
>>>>> Unlike removal of deprecated features, which we align on a major
>>>>> version boundary, changing the default is something we can do in a minor
>>>>> version as well - once there is api parity.
>>>>>
>>>>> Irrespective of which major/minor version we make the switch in -
>>>>> there could be user impact; minimizing this impact would be greatly
>>>>> appreciated by our users.
>>>>>
>>>>> Regards,
>>>>> Mridul
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 27, 2024 at 8:31 PM Holden Karau <holden.ka...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> -0.5: I don’t think this a good idea for JVM apps until we have API
>>>>>> parity. (Binding but to be clear not a veto)
>>>>>>
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>> Pronouns: she/her
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 27, 2024 at 6:27 PM Xinrong Meng <xinr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> Thank you Herman!
>>>>>>>
>>>>>>> On Thu, Nov 28, 2024 at 3:37 AM Dongjoon Hyun <
>>>>>>> dongjoon.h...@gmail.com> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> On Wed, Nov 27, 2024 at 09:16 Denny Lee <denny.g....@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 (non-binding)
>>>>>>>>>
>>>>>>>>> On Wed, Nov 27, 2024 at 3:07 AM Martin Grund
>>>>>>>>> <mar...@databricks.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> As part of the discussion on this topic, I would love to
>>>>>>>>>> highlight the work that the community is currently doing to support
>>>>>>>>>> SparkML, which is traditionally very RDD-heavy, natively in Spark 
>>>>>>>>>> Connect.
>>>>>>>>>> Bobby's awesome work shows that, over time, we can extend the 
>>>>>>>>>> features of
>>>>>>>>>> Spark Connect and support workloads that we previously thought could 
>>>>>>>>>> not be
>>>>>>>>>> supported easily.
>>>>>>>>>>
>>>>>>>>>> https://github.com/apache/spark/pull/48791
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 27, 2024 at 11:42 AM Yang,Jie(INF)
>>>>>>>>>> <yangji...@baidu.com.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>> -------- 原始邮件 --------
>>>>>>>>>>> 发件人：Hyukjin Kwon<gurwls...@apache.org>
>>>>>>>>>>> 时间：2024-11-27 08:04:06
>>>>>>>>>>> 主题：[外部邮件] Re： Spark Connect the default API in Spark 4.0
>>>>>>>>>>> 收件人：Bjørn Jørgensen<bjornjorgen...@gmail.com>;
>>>>>>>>>>> 抄送人：Herman van Hovell<her...@databricks.com.invalid>;Spark dev
>>>>>>>>>>> list<dev@spark.apache.org>;
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 25 Nov 2024 at 23:33, Bjørn Jørgensen <
>>>>>>>>>>> bjornjorgen...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1
>>>>>>>>>>>>
>>>>>>>>>>>> man. 25. nov. 2024 kl. 14:48 skrev Herman van Hovell
>>>>>>>>>>>> <her...@databricks.com.invalid>:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to start a discussion on "Spark Connect the
>>>>>>>>>>>>> default API in Spark 4.0".
>>>>>>>>>>>>>
>>>>>>>>>>>>> The rationale for this change is that Spark Connect brings a
>>>>>>>>>>>>> lot of improvements with respect to simplicity, stability, 
>>>>>>>>>>>>> isolation,
>>>>>>>>>>>>> upgradability, and extensibility (all detailed in the SPIP). In a 
>>>>>>>>>>>>> nutshell:
>>>>>>>>>>>>> we want to introduce a flag, spark.api.mode, that allows a
>>>>>>>>>>>>> user to choose between classic or connect mode, the default
>>>>>>>>>>>>> being connect. A user can easily fallback to Classic by
>>>>>>>>>>>>> setting spark.api.mode to classic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> SPIP:
>>>>>>>>>>>>> https://docs.google.com/document/d/1C0kuQEliG78HujVwdnSk0wjNwHEDdwo2o8aVq7kbhTo/edit?tab=t.0#heading=h.r2c3xrbiklu3
>>>>>>>>>>>>> <https://mailshield.baidu.com/check?q=5uIK5BsJhkKEitTyTno8Yb7Zq%2boLHvRsgSoBr5oTNJEHXWS9Np0U8pCuv2DeJDfCQJiI52FAoCrxDEqnj1jOqX9A3jtJcetvkKkKE696xfrLfKuuRuyCC9YrwN5IW4OUtkhdHz7C%2bER2GN9EPqnlIlX2osm36Zbn>
>>>>>>>>>>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-50411
>>>>>>>>>>>>> <https://mailshield.baidu.com/check?q=vc5arXeK3OKfjk5Oxe1F%2fMNjR%2fSx5pTdbaOArWe9m2MpZDOF702CYYagPMQmbDqV7xnWwxsUdOc%3d>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am looking forward to your feedback!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Herman
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Bjørn Jørgensen
>>>>>>>>>>>> Vestre Aspehaug 4, 6010 Ålesund
>>>>>>>>>>>> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++%0D%0ANorge?entry=gmail&source=g>
>>>>>>>>>>>> Norge
>>>>>>>>>>>> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++%0D%0ANorge?entry=gmail&source=g>
>>>>>>>>>>>>
>>>>>>>>>>>> +47 480 94 297
>>>>>>>>>>>>
>>>>>>>>>>>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [外部邮件] Re: Spark Connect the default API in Spark 4.0

Reply via email to