+1 On 2024/08/13 23:45:11 Xiangrui Meng wrote: > +1 > > On Tue, Aug 13, 2024, 2:43 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > > > +1 > > > > Looks to be sufficient to VOTE? > > > > 2024년 8월 14일 (수) 오전 1:10, Wenchen Fan <cloud0...@gmail.com>님이 작성: > > > >> +1 > >> > >> On Tue, Aug 13, 2024 at 10:50 PM L. C. Hsieh <vii...@gmail.com> wrote: > >> > >>> +1 > >>> > >>> On Tue, Aug 13, 2024 at 2:54 AM Dongjoon Hyun <dongjoon.h...@gmail.com> > >>> wrote: > >>> > > >>> > +1 > >>> > > >>> > Dongjoon > >>> > > >>> > On Mon, Aug 12, 2024 at 17:52 Holden Karau <holden.ka...@gmail.com> > >>> wrote: > >>> >> > >>> >> +1 > >>> >> > >>> >> Are the sparklyr folks on this list? > >>> >> > >>> >> Twitter: https://twitter.com/holdenkarau > >>> >> Books (Learning Spark, High Performance Spark, etc.): > >>> https://amzn.to/2MaRAG9 > >>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau > >>> >> Pronouns: she/her > >>> >> > >>> >> > >>> >> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li <gatorsm...@gmail.com> wrote: > >>> >>> > >>> >>> +1 > >>> >>> > >>> >>> Hyukjin Kwon <gurwls...@apache.org> 于2024年8月12日周一 16:18写道: > >>> >>>> > >>> >>>> +1 > >>> >>>> > >>> >>>> On Tue, Aug 13, 2024 at 7:04 AM Nicholas Chammas < > >>> nicholas.cham...@gmail.com> wrote: > >>> >>>>> > >>> >>>>> And just for the record, the stats that I screenshotted in that > >>> thread I linked to showed the following page views for each sub-section > >>> under `docs/latest/api/`: > >>> >>>>> > >>> >>>>> - python: 758K > >>> >>>>> - java: 66K > >>> >>>>> - sql: 39K > >>> >>>>> - scala: 35K > >>> >>>>> - r: <1K > >>> >>>>> > >>> >>>>> I don’t recall over what time period those stats were collected > >>> for, and there are certainly some factors of how the stats are gathered > >>> and > >>> how the various language API docs are accessed that impact those numbers. > >>> So it’s by no means a solid, objective measure. But I thought it was an > >>> interesting signal nonetheless. > >>> >>>>> > >>> >>>>> > >>> >>>>> On Aug 12, 2024, at 5:50 PM, Nicholas Chammas < > >>> nicholas.cham...@gmail.com> wrote: > >>> >>>>> > >>> >>>>> Not an R user myself, but +1. > >>> >>>>> > >>> >>>>> I first wondered about the future of SparkR after noticing how low > >>> the visit stats were for the R API docs as compared to Python and Scala. > >>> (I > >>> can’t seem to find those visit stats for the API docs anymore.) > >>> >>>>> > >>> >>>>> > >>> >>>>> On Aug 12, 2024, at 11:47 AM, Shivaram Venkataraman < > >>> shivaram.venkatara...@gmail.com> wrote: > >>> >>>>> > >>> >>>>> Hi > >>> >>>>> > >>> >>>>> About ten years ago, I created the original SparkR package as part > >>> of my research at UC Berkeley [SPARK-5654]. After my PhD I started as a > >>> professor at UW-Madison and my contributions to SparkR have been in the > >>> background given my availability. I continue to be involved in the > >>> community and teach a popular course at UW-Madison which uses Apache Spark > >>> for programming assignments. > >>> >>>>> > >>> >>>>> As the original contributor and author of a research paper on > >>> SparkR, I also continue to get private emails from users. A common > >>> question > >>> I get is whether one should use SparkR in Apache Spark or the sparklyr > >>> package (built on top of Apache Spark). You can also see this in > >>> StackOverflow questions and other blog posts online: > >>> https://www.google.com/search?q=sparkr+vs+sparklyr . While, I have > >>> encouraged users to choose the SparkR package as it is maintained by the > >>> Apache project, the more I looked into sparklyr, the more I was convinced > >>> that it is a better choice for R users that want to leverage the power of > >>> Spark: > >>> >>>>> > >>> >>>>> (1) sparklyr is developed by a community of developers who > >>> understand the R programming language deeply, and as a result is more > >>> idiomatic. In hindsight, sparklyr’s more idiomatic approach would have > >>> been > >>> a better choice than the Scala-like API we have in SparkR. > >>> >>>>> > >>> >>>>> (2) Contributions to SparkR have decreased slowly. Over the last > >>> two years, there have been 65 commits on the Spark R codebase (compared to > >>> ~2200 on the Spark Python code base). In contrast Sparklyr has over 300 > >>> commits in the same period.. > >>> >>>>> > >>> >>>>> (3) Previously, using and deploying sparklyr had been cumbersome > >>> as it needed careful alignment of versions between Apache Spark and > >>> sparklyr. However, the sparklyr community has implemented a new Spark > >>> Connect based architecture which eliminates this issue. > >>> >>>>> > >>> >>>>> (4) The sparklyr community has maintained their package on CRAN – > >>> it takes some effort to do this as the CRAN release process requires > >>> passing a number of tests. While SparkR was on CRAN initially, we could > >>> not > >>> maintain that given our release process and cadence. This makes sparklyr > >>> much more accessible to the R community. > >>> >>>>> > >>> >>>>> So it is with a bittersweet feeling that I’m writing this email to > >>> propose that we deprecate SparkR, and recommend sparklyr as the R language > >>> binding for Spark. This will reduce complexity of our own codebase, and > >>> more importantly reduce confusion for users. As the sparklyr package is > >>> distributed using the same permissive license as Apache Spark, there > >>> should > >>> be no downside for existing SparkR users in adopting it. > >>> >>>>> > >>> >>>>> My proposal is to mark SparkR as deprecated in the upcoming Spark > >>> 4 release, and remove it from Apache Spark with the following major > >>> release, Spark 5. > >>> >>>>> > >>> >>>>> I’m looking forward to hearing your thoughts and feedback on this > >>> proposal and I’m happy to create the SPIP ticket for a vote on this > >>> proposal using this email thread as the justification. > >>> >>>>> > >>> >>>>> Thanks > >>> >>>>> Shivaram > >>> >>>>> > >>> >>>>> > >>> >>>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> > >>> >
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org