+1

On Tue, Aug 13, 2024, 2:43 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> +1
>
> Looks to be sufficient to VOTE?
>
> 2024년 8월 14일 (수) 오전 1:10, Wenchen Fan <cloud0...@gmail.com>님이 작성:
>
>> +1
>>
>> On Tue, Aug 13, 2024 at 10:50 PM L. C. Hsieh <vii...@gmail.com> wrote:
>>
>>> +1
>>>
>>> On Tue, Aug 13, 2024 at 2:54 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>> wrote:
>>> >
>>> > +1
>>> >
>>> > Dongjoon
>>> >
>>> > On Mon, Aug 12, 2024 at 17:52 Holden Karau <holden.ka...@gmail.com>
>>> wrote:
>>> >>
>>> >> +1
>>> >>
>>> >> Are the sparklyr folks on this list?
>>> >>
>>> >> Twitter: https://twitter.com/holdenkarau
>>> >> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> >> Pronouns: she/her
>>> >>
>>> >>
>>> >> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li <gatorsm...@gmail.com> wrote:
>>> >>>
>>> >>> +1
>>> >>>
>>> >>> Hyukjin Kwon <gurwls...@apache.org> 于2024年8月12日周一 16:18写道:
>>> >>>>
>>> >>>> +1
>>> >>>>
>>> >>>> On Tue, Aug 13, 2024 at 7:04 AM Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>> >>>>>
>>> >>>>> And just for the record, the stats that I screenshotted in that
>>> thread I linked to showed the following page views for each sub-section
>>> under `docs/latest/api/`:
>>> >>>>>
>>> >>>>> - python: 758K
>>> >>>>> - java: 66K
>>> >>>>> - sql: 39K
>>> >>>>> - scala: 35K
>>> >>>>> - r: <1K
>>> >>>>>
>>> >>>>> I don’t recall over what time period those stats were collected
>>> for, and there are certainly some factors of how the stats are gathered and
>>> how the various language API docs are accessed that impact those numbers.
>>> So it’s by no means a solid, objective measure. But I thought it was an
>>> interesting signal nonetheless.
>>> >>>>>
>>> >>>>>
>>> >>>>> On Aug 12, 2024, at 5:50 PM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>> >>>>>
>>> >>>>> Not an R user myself, but +1.
>>> >>>>>
>>> >>>>> I first wondered about the future of SparkR after noticing how low
>>> the visit stats were for the R API docs as compared to Python and Scala. (I
>>> can’t seem to find those visit stats for the API docs anymore.)
>>> >>>>>
>>> >>>>>
>>> >>>>> On Aug 12, 2024, at 11:47 AM, Shivaram Venkataraman <
>>> shivaram.venkatara...@gmail.com> wrote:
>>> >>>>>
>>> >>>>> Hi
>>> >>>>>
>>> >>>>> About ten years ago, I created the original SparkR package as part
>>> of my research at UC Berkeley [SPARK-5654]. After my PhD I started as a
>>> professor at UW-Madison and my contributions to SparkR have been in the
>>> background given my availability. I continue to be involved in the
>>> community and teach a popular course at UW-Madison which uses Apache Spark
>>> for programming assignments.
>>> >>>>>
>>> >>>>> As the original contributor and author of a research paper on
>>> SparkR, I also continue to get private emails from users. A common question
>>> I get is whether one should use SparkR in Apache Spark or the sparklyr
>>> package (built on top of Apache Spark). You can also see this in
>>> StackOverflow questions and other blog posts online:
>>> https://www.google.com/search?q=sparkr+vs+sparklyr . While, I have
>>> encouraged users to choose the SparkR package as it is maintained by the
>>> Apache project, the more I looked into sparklyr, the more I was convinced
>>> that it is a better choice for R users that want to leverage the power of
>>> Spark:
>>> >>>>>
>>> >>>>> (1) sparklyr is developed by a community of developers who
>>> understand the R programming language deeply, and as a result is more
>>> idiomatic. In hindsight, sparklyr’s more idiomatic approach would have been
>>> a better choice than the Scala-like API we have in SparkR.
>>> >>>>>
>>> >>>>> (2) Contributions to SparkR have decreased slowly. Over the last
>>> two years, there have been 65 commits on the Spark R codebase (compared to
>>> ~2200 on the Spark Python code base). In contrast Sparklyr has over 300
>>> commits in the same period..
>>> >>>>>
>>> >>>>> (3) Previously, using and deploying sparklyr had been cumbersome
>>> as it needed careful alignment of versions between Apache Spark and
>>> sparklyr. However, the sparklyr community has implemented a new Spark
>>> Connect based architecture which eliminates this issue.
>>> >>>>>
>>> >>>>> (4) The sparklyr community has maintained their package on CRAN –
>>> it takes some effort to do this as the CRAN release process requires
>>> passing a number of tests. While SparkR was on CRAN initially, we could not
>>> maintain that given our release process and cadence. This makes sparklyr
>>> much more accessible to the R community.
>>> >>>>>
>>> >>>>> So it is with a bittersweet feeling that I’m writing this email to
>>> propose that we deprecate SparkR, and recommend sparklyr as the R language
>>> binding for Spark. This will reduce complexity of our own codebase, and
>>> more importantly reduce confusion for users. As the sparklyr package is
>>> distributed using the same permissive license as Apache Spark, there should
>>> be no downside for existing SparkR users in adopting it.
>>> >>>>>
>>> >>>>> My proposal is to mark SparkR as deprecated in the upcoming Spark
>>> 4 release, and remove it from Apache Spark with the following major
>>> release, Spark 5.
>>> >>>>>
>>> >>>>> I’m looking forward to hearing your thoughts and feedback on this
>>> proposal and I’m happy to create the SPIP ticket for a vote on this
>>> proposal using this email thread as the justification.
>>> >>>>>
>>> >>>>> Thanks
>>> >>>>> Shivaram
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

Reply via email to