And just for the record, the stats that I screenshotted <https://lists.apache.org/api/email.lua?attachment=true&id=jd1hyq6c9v1qg0ym5qlct8lgcxk9yd6z&file=7a28ae0d6eb4c25e047ff90601a941f7acfc3214f837604b545b4f926b8eb628> in that thread I linked to showed the following page views for each sub-section under `docs/latest/api/`:
- python: 758K - java: 66K - sql: 39K - scala: 35K - r: <1K I don’t recall over what time period those stats were collected for, and there are certainly some factors of how the stats are gathered and how the various language API docs are accessed that impact those numbers. So it’s by no means a solid, objective measure. But I thought it was an interesting signal nonetheless. > On Aug 12, 2024, at 5:50 PM, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > > Not an R user myself, but +1. > > I first wondered about the future of SparkR after noticing > <https://lists.apache.org/thread/jd1hyq6c9v1qg0ym5qlct8lgcxk9yd6z> how low > the visit stats were for the R API docs as compared to Python and Scala. (I > can’t seem to find those visit stats > <https://analytics.apache.org/index.php?module=CoreHome&action=index&date=today&period=month&idSite=40#?period=month&date=2024-07-02&idSite=40&category=General_Actions&subcategory=General_Pages> > for the API docs anymore.) > > >> On Aug 12, 2024, at 11:47 AM, Shivaram Venkataraman >> <shivaram.venkatara...@gmail.com> wrote: >> >> Hi >> >> About ten years ago, I created the original SparkR package as part of my >> research at UC Berkeley [SPARK-5654 >> <https://issues.apache.org/jira/browse/SPARK-5654>]. After my PhD I started >> as a professor at UW-Madison and my contributions to SparkR have been in the >> background given my availability. I continue to be involved in the community >> and teach a popular course at UW-Madison which uses Apache Spark for >> programming assignments. >> >> As the original contributor and author of a research paper on SparkR, I also >> continue to get private emails from users. A common question I get is >> whether one should use SparkR in Apache Spark or the sparklyr package (built >> on top of Apache Spark). You can also see this in StackOverflow questions >> and other blog posts online: >> https://www.google.com/search?q=sparkr+vs+sparklyr . While, I have >> encouraged users to choose the SparkR package as it is maintained by the >> Apache project, the more I looked into sparklyr, the more I was convinced >> that it is a better choice for R users that want to leverage the power of >> Spark: >> >> (1) sparklyr is developed by a community of developers who understand the R >> programming language deeply, and as a result is more idiomatic. In >> hindsight, sparklyr’s more idiomatic approach would have been a better >> choice than the Scala-like API we have in SparkR. >> >> (2) Contributions to SparkR have decreased slowly. Over the last two years, >> there have been 65 commits on the Spark R codebase (compared to ~2200 on the >> Spark Python code base). In contrast Sparklyr has over 300 commits in the >> same period.. >> >> (3) Previously, using and deploying sparklyr had been cumbersome as it >> needed careful alignment of versions between Apache Spark and sparklyr. >> However, the sparklyr community has implemented a new Spark Connect based >> architecture which eliminates this issue. >> >> (4) The sparklyr community has maintained their package on CRAN – it takes >> some effort to do this as the CRAN release process requires passing a number >> of tests. While SparkR was on CRAN initially, we could not maintain that >> given our release process and cadence. This makes sparklyr much more >> accessible to the R community. >> >> So it is with a bittersweet feeling that I’m writing this email to propose >> that we deprecate SparkR, and recommend sparklyr as the R language binding >> for Spark. This will reduce complexity of our own codebase, and more >> importantly reduce confusion for users. As the sparklyr package is >> distributed using the same permissive license as Apache Spark, there should >> be no downside for existing SparkR users in adopting it. >> >> My proposal is to mark SparkR as deprecated in the upcoming Spark 4 release, >> and remove it from Apache Spark with the following major release, Spark 5. >> >> I’m looking forward to hearing your thoughts and feedback on this proposal >> and I’m happy to create the SPIP ticket for a vote on this proposal using >> this email thread as the justification. >> >> Thanks >> Shivaram >