+1 Looks to be sufficient to VOTE?
2024년 8월 14일 (수) 오전 1:10, Wenchen Fan <cloud0...@gmail.com>님이 작성: > +1 > > On Tue, Aug 13, 2024 at 10:50 PM L. C. Hsieh <vii...@gmail.com> wrote: > >> +1 >> >> On Tue, Aug 13, 2024 at 2:54 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> > >> > +1 >> > >> > Dongjoon >> > >> > On Mon, Aug 12, 2024 at 17:52 Holden Karau <holden.ka...@gmail.com> >> wrote: >> >> >> >> +1 >> >> >> >> Are the sparklyr folks on this list? >> >> >> >> Twitter: https://twitter.com/holdenkarau >> >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 >> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >> Pronouns: she/her >> >> >> >> >> >> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li <gatorsm...@gmail.com> wrote: >> >>> >> >>> +1 >> >>> >> >>> Hyukjin Kwon <gurwls...@apache.org> 于2024年8月12日周一 16:18写道: >> >>>> >> >>>> +1 >> >>>> >> >>>> On Tue, Aug 13, 2024 at 7:04 AM Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>>>> >> >>>>> And just for the record, the stats that I screenshotted in that >> thread I linked to showed the following page views for each sub-section >> under `docs/latest/api/`: >> >>>>> >> >>>>> - python: 758K >> >>>>> - java: 66K >> >>>>> - sql: 39K >> >>>>> - scala: 35K >> >>>>> - r: <1K >> >>>>> >> >>>>> I don’t recall over what time period those stats were collected >> for, and there are certainly some factors of how the stats are gathered and >> how the various language API docs are accessed that impact those numbers. >> So it’s by no means a solid, objective measure. But I thought it was an >> interesting signal nonetheless. >> >>>>> >> >>>>> >> >>>>> On Aug 12, 2024, at 5:50 PM, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>>>> >> >>>>> Not an R user myself, but +1. >> >>>>> >> >>>>> I first wondered about the future of SparkR after noticing how low >> the visit stats were for the R API docs as compared to Python and Scala. (I >> can’t seem to find those visit stats for the API docs anymore.) >> >>>>> >> >>>>> >> >>>>> On Aug 12, 2024, at 11:47 AM, Shivaram Venkataraman < >> shivaram.venkatara...@gmail.com> wrote: >> >>>>> >> >>>>> Hi >> >>>>> >> >>>>> About ten years ago, I created the original SparkR package as part >> of my research at UC Berkeley [SPARK-5654]. After my PhD I started as a >> professor at UW-Madison and my contributions to SparkR have been in the >> background given my availability. I continue to be involved in the >> community and teach a popular course at UW-Madison which uses Apache Spark >> for programming assignments. >> >>>>> >> >>>>> As the original contributor and author of a research paper on >> SparkR, I also continue to get private emails from users. A common question >> I get is whether one should use SparkR in Apache Spark or the sparklyr >> package (built on top of Apache Spark). You can also see this in >> StackOverflow questions and other blog posts online: >> https://www.google.com/search?q=sparkr+vs+sparklyr . While, I have >> encouraged users to choose the SparkR package as it is maintained by the >> Apache project, the more I looked into sparklyr, the more I was convinced >> that it is a better choice for R users that want to leverage the power of >> Spark: >> >>>>> >> >>>>> (1) sparklyr is developed by a community of developers who >> understand the R programming language deeply, and as a result is more >> idiomatic. In hindsight, sparklyr’s more idiomatic approach would have been >> a better choice than the Scala-like API we have in SparkR. >> >>>>> >> >>>>> (2) Contributions to SparkR have decreased slowly. Over the last >> two years, there have been 65 commits on the Spark R codebase (compared to >> ~2200 on the Spark Python code base). In contrast Sparklyr has over 300 >> commits in the same period.. >> >>>>> >> >>>>> (3) Previously, using and deploying sparklyr had been cumbersome as >> it needed careful alignment of versions between Apache Spark and sparklyr. >> However, the sparklyr community has implemented a new Spark Connect based >> architecture which eliminates this issue. >> >>>>> >> >>>>> (4) The sparklyr community has maintained their package on CRAN – >> it takes some effort to do this as the CRAN release process requires >> passing a number of tests. While SparkR was on CRAN initially, we could not >> maintain that given our release process and cadence. This makes sparklyr >> much more accessible to the R community. >> >>>>> >> >>>>> So it is with a bittersweet feeling that I’m writing this email to >> propose that we deprecate SparkR, and recommend sparklyr as the R language >> binding for Spark. This will reduce complexity of our own codebase, and >> more importantly reduce confusion for users. As the sparklyr package is >> distributed using the same permissive license as Apache Spark, there should >> be no downside for existing SparkR users in adopting it. >> >>>>> >> >>>>> My proposal is to mark SparkR as deprecated in the upcoming Spark 4 >> release, and remove it from Apache Spark with the following major release, >> Spark 5. >> >>>>> >> >>>>> I’m looking forward to hearing your thoughts and feedback on this >> proposal and I’m happy to create the SPIP ticket for a vote on this >> proposal using this email thread as the justification. >> >>>>> >> >>>>> Thanks >> >>>>> Shivaram >> >>>>> >> >>>>> >> >>>>> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>