+1 On Tue, Aug 13, 2024 at 2:54 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > > +1 > > Dongjoon > > On Mon, Aug 12, 2024 at 17:52 Holden Karau <holden.ka...@gmail.com> wrote: >> >> +1 >> >> Are the sparklyr folks on this list? >> >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> Pronouns: she/her >> >> >> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li <gatorsm...@gmail.com> wrote: >>> >>> +1 >>> >>> Hyukjin Kwon <gurwls...@apache.org> 于2024年8月12日周一 16:18写道: >>>> >>>> +1 >>>> >>>> On Tue, Aug 13, 2024 at 7:04 AM Nicholas Chammas >>>> <nicholas.cham...@gmail.com> wrote: >>>>> >>>>> And just for the record, the stats that I screenshotted in that thread I >>>>> linked to showed the following page views for each sub-section under >>>>> `docs/latest/api/`: >>>>> >>>>> - python: 758K >>>>> - java: 66K >>>>> - sql: 39K >>>>> - scala: 35K >>>>> - r: <1K >>>>> >>>>> I don’t recall over what time period those stats were collected for, and >>>>> there are certainly some factors of how the stats are gathered and how >>>>> the various language API docs are accessed that impact those numbers. So >>>>> it’s by no means a solid, objective measure. But I thought it was an >>>>> interesting signal nonetheless. >>>>> >>>>> >>>>> On Aug 12, 2024, at 5:50 PM, Nicholas Chammas >>>>> <nicholas.cham...@gmail.com> wrote: >>>>> >>>>> Not an R user myself, but +1. >>>>> >>>>> I first wondered about the future of SparkR after noticing how low the >>>>> visit stats were for the R API docs as compared to Python and Scala. (I >>>>> can’t seem to find those visit stats for the API docs anymore.) >>>>> >>>>> >>>>> On Aug 12, 2024, at 11:47 AM, Shivaram Venkataraman >>>>> <shivaram.venkatara...@gmail.com> wrote: >>>>> >>>>> Hi >>>>> >>>>> About ten years ago, I created the original SparkR package as part of my >>>>> research at UC Berkeley [SPARK-5654]. After my PhD I started as a >>>>> professor at UW-Madison and my contributions to SparkR have been in the >>>>> background given my availability. I continue to be involved in the >>>>> community and teach a popular course at UW-Madison which uses Apache >>>>> Spark for programming assignments. >>>>> >>>>> As the original contributor and author of a research paper on SparkR, I >>>>> also continue to get private emails from users. A common question I get >>>>> is whether one should use SparkR in Apache Spark or the sparklyr package >>>>> (built on top of Apache Spark). You can also see this in StackOverflow >>>>> questions and other blog posts online: >>>>> https://www.google.com/search?q=sparkr+vs+sparklyr . While, I have >>>>> encouraged users to choose the SparkR package as it is maintained by the >>>>> Apache project, the more I looked into sparklyr, the more I was convinced >>>>> that it is a better choice for R users that want to leverage the power of >>>>> Spark: >>>>> >>>>> (1) sparklyr is developed by a community of developers who understand the >>>>> R programming language deeply, and as a result is more idiomatic. In >>>>> hindsight, sparklyr’s more idiomatic approach would have been a better >>>>> choice than the Scala-like API we have in SparkR. >>>>> >>>>> (2) Contributions to SparkR have decreased slowly. Over the last two >>>>> years, there have been 65 commits on the Spark R codebase (compared to >>>>> ~2200 on the Spark Python code base). In contrast Sparklyr has over 300 >>>>> commits in the same period.. >>>>> >>>>> (3) Previously, using and deploying sparklyr had been cumbersome as it >>>>> needed careful alignment of versions between Apache Spark and sparklyr. >>>>> However, the sparklyr community has implemented a new Spark Connect based >>>>> architecture which eliminates this issue. >>>>> >>>>> (4) The sparklyr community has maintained their package on CRAN – it >>>>> takes some effort to do this as the CRAN release process requires passing >>>>> a number of tests. While SparkR was on CRAN initially, we could not >>>>> maintain that given our release process and cadence. This makes sparklyr >>>>> much more accessible to the R community. >>>>> >>>>> So it is with a bittersweet feeling that I’m writing this email to >>>>> propose that we deprecate SparkR, and recommend sparklyr as the R >>>>> language binding for Spark. This will reduce complexity of our own >>>>> codebase, and more importantly reduce confusion for users. As the >>>>> sparklyr package is distributed using the same permissive license as >>>>> Apache Spark, there should be no downside for existing SparkR users in >>>>> adopting it. >>>>> >>>>> My proposal is to mark SparkR as deprecated in the upcoming Spark 4 >>>>> release, and remove it from Apache Spark with the following major >>>>> release, Spark 5. >>>>> >>>>> I’m looking forward to hearing your thoughts and feedback on this >>>>> proposal and I’m happy to create the SPIP ticket for a vote on this >>>>> proposal using this email thread as the justification. >>>>> >>>>> Thanks >>>>> Shivaram >>>>> >>>>> >>>>>
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org