+1
-------- 原始邮件 --------
发件人:Jungtaek Lim<kabhwan.opensou...@gmail.com>
时间:2024-08-16 09:06:52
主题:[外部邮件] Re: [DISCUSS] Deprecating SparkR
收件人:Wenchen Fan<cloud0...@gmail.com>;
抄送人:L. C. Hsieh<vii...@gmail.com>;Dongjoon Hyun<dongjoon.h...@gmail.com>;Holden 
Karau<holden.ka...@gmail.com>;Xiao Li<gatorsm...@gmail.com>;Hyukjin 
Kwon<gurwls...@apache.org>;Nicholas 
Chammas<nicholas.cham...@gmail.com>;Shivaram 
Venkataraman<shivaram.venkatara...@gmail.com>;dev<dev@spark.apache.org>;
+1

Looks to be sufficient to VOTE?

2024년 8월 14일 (수) 오전 1:10, Wenchen Fan 
<cloud0...@gmail.com<mailto:cloud0...@gmail.com>>님이 작성:
+1

On Tue, Aug 13, 2024 at 10:50 PM L. C. Hsieh 
<vii...@gmail.com<mailto:vii...@gmail.com>> wrote:
+1

On Tue, Aug 13, 2024 at 2:54 AM Dongjoon Hyun 
<dongjoon.h...@gmail.com<mailto:dongjoon.h...@gmail.com>> wrote:
>
> +1
>
> Dongjoon
>
> On Mon, Aug 12, 2024 at 17:52 Holden Karau 
> <holden.ka...@gmail.com<mailto:holden.ka...@gmail.com>> wrote:
>>
>> +1
>>
>> Are the sparklyr folks on this list?
>>
>> Twitter: 
>> https://twitter.com/holdenkarau<https://mailshield.baidu.com/check?q=9DewFnOIsK%2bK64Uu60Jx4QkcL9rDgnApD6spzOBjk%2fa2KQxn>
>> Books (Learning Spark, High Performance Spark, etc.): 
>> https://amzn.to/2MaRAG9<https://mailshield.baidu.com/check?q=D34Ozfkj%2bFrnkuu9ci%2b4FcMkreOvMZ3jO85bIw%3d%3d>
>> YouTube Live Streams: 
>> https://www.youtube.com/user/holdenkarau<https://mailshield.baidu.com/check?q=nadOZCZjNeU0qOVGCJesf8dvH4OrsWdKamKIxnJncPneWoN8%2bsIqc2DWow8%3d>
>> Pronouns: she/her
>>
>>
>> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li 
>> <gatorsm...@gmail.com<mailto:gatorsm...@gmail.com>> wrote:
>>>
>>> +1
>>>
>>> Hyukjin Kwon <gurwls...@apache.org<mailto:gurwls...@apache.org>> 
>>> 于2024年8月12日周一 16:18写道:
>>>>
>>>> +1
>>>>
>>>> On Tue, Aug 13, 2024 at 7:04 AM Nicholas Chammas 
>>>> <nicholas.cham...@gmail.com<mailto:nicholas.cham...@gmail.com>> wrote:
>>>>>
>>>>> And just for the record, the stats that I screenshotted in that thread I 
>>>>> linked to showed the following page views for each sub-section under 
>>>>> `docs/latest/api/`:
>>>>>
>>>>> - python: 758K
>>>>> - java: 66K
>>>>> - sql: 39K
>>>>> - scala: 35K
>>>>> - r: <1K
>>>>>
>>>>> I don’t recall over what time period those stats were collected for, and 
>>>>> there are certainly some factors of how the stats are gathered and how 
>>>>> the various language API docs are accessed that impact those numbers. So 
>>>>> it’s by no means a solid, objective measure. But I thought it was an 
>>>>> interesting signal nonetheless.
>>>>>
>>>>>
>>>>> On Aug 12, 2024, at 5:50 PM, Nicholas Chammas 
>>>>> <nicholas.cham...@gmail.com<mailto:nicholas.cham...@gmail.com>> wrote:
>>>>>
>>>>> Not an R user myself, but +1.
>>>>>
>>>>> I first wondered about the future of SparkR after noticing how low the 
>>>>> visit stats were for the R API docs as compared to Python and Scala. (I 
>>>>> can’t seem to find those visit stats for the API docs anymore.)
>>>>>
>>>>>
>>>>> On Aug 12, 2024, at 11:47 AM, Shivaram Venkataraman 
>>>>> <shivaram.venkatara...@gmail.com<mailto:shivaram.venkatara...@gmail.com>> 
>>>>> wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> About ten years ago, I created the original SparkR package as part of my 
>>>>> research at UC Berkeley [SPARK-5654]. After my PhD I started as a 
>>>>> professor at UW-Madison and my contributions to SparkR have been in the 
>>>>> background given my availability. I continue to be involved in the 
>>>>> community and teach a popular course at UW-Madison which uses Apache 
>>>>> Spark for programming assignments.
>>>>>
>>>>> As the original contributor and author of a research paper on SparkR, I 
>>>>> also continue to get private emails from users. A common question I get 
>>>>> is whether one should use SparkR in Apache Spark or the sparklyr package 
>>>>> (built on top of Apache Spark). You can also see this in StackOverflow 
>>>>> questions and other blog posts online: 
>>>>> https://www.google.com/search?q=sparkr+vs+sparklyr<https://mailshield.baidu.com/check?q=V9NOYe0s3hzmZj3VP6RxusBARLVBxNQZIGxXtw0rzdnNXy8UKg73EZKRZaAgjW2ptlIyx1Uu080%3d>
>>>>>  . While, I have encouraged users to choose the SparkR package as it is 
>>>>> maintained by the Apache project, the more I looked into sparklyr, the 
>>>>> more I was convinced that it is a better choice for R users that want to 
>>>>> leverage the power of Spark:
>>>>>
>>>>> (1) sparklyr is developed by a community of developers who understand the 
>>>>> R programming language deeply, and as a result is more idiomatic. In 
>>>>> hindsight, sparklyr’s more idiomatic approach would have been a better 
>>>>> choice than the Scala-like API we have in SparkR.
>>>>>
>>>>> (2) Contributions to SparkR have decreased slowly. Over the last two 
>>>>> years, there have been 65 commits on the Spark R codebase (compared to 
>>>>> ~2200 on the Spark Python code base). In contrast Sparklyr has over 300 
>>>>> commits in the same period..
>>>>>
>>>>> (3) Previously, using and deploying sparklyr had been cumbersome as it 
>>>>> needed careful alignment of versions between Apache Spark and sparklyr. 
>>>>> However, the sparklyr community has implemented a new Spark Connect based 
>>>>> architecture which eliminates this issue.
>>>>>
>>>>> (4) The sparklyr community has maintained their package on CRAN – it 
>>>>> takes some effort to do this as the CRAN release process requires passing 
>>>>> a number of tests. While SparkR was on CRAN initially, we could not 
>>>>> maintain that given our release process and cadence. This makes sparklyr 
>>>>> much more accessible to the R community.
>>>>>
>>>>> So it is with a bittersweet feeling that I’m writing this email to 
>>>>> propose that we deprecate SparkR, and recommend sparklyr as the R 
>>>>> language binding for Spark. This will reduce complexity of our own 
>>>>> codebase, and more importantly reduce confusion for users. As the 
>>>>> sparklyr package is distributed using the same permissive license as 
>>>>> Apache Spark, there should be no downside for existing SparkR users in 
>>>>> adopting it.
>>>>>
>>>>> My proposal is to mark SparkR as deprecated in the upcoming Spark 4 
>>>>> release, and remove it from Apache Spark with the following major 
>>>>> release, Spark 5.
>>>>>
>>>>> I’m looking forward to hearing your thoughts and feedback on this 
>>>>> proposal and I’m happy to create the SPIP ticket for a vote on this 
>>>>> proposal using this email thread as the justification.
>>>>>
>>>>> Thanks
>>>>> Shivaram
>>>>>
>>>>>
>>>>>

---------------------------------------------------------------------
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>

Reply via email to