Not an R user myself, but +1.

I first wondered about the future of SparkR after noticing 
<https://lists.apache.org/thread/jd1hyq6c9v1qg0ym5qlct8lgcxk9yd6z> how low the 
visit stats were for the R API docs as compared to Python and Scala. (I can’t 
seem to find those visit stats 
<https://analytics.apache.org/index.php?module=CoreHome&action=index&date=today&period=month&idSite=40#?period=month&date=2024-07-02&idSite=40&category=General_Actions&subcategory=General_Pages>
 for the API docs anymore.)


> On Aug 12, 2024, at 11:47 AM, Shivaram Venkataraman 
> <shivaram.venkatara...@gmail.com> wrote:
> 
> Hi
> 
> About ten years ago, I created the original SparkR package as part of my 
> research at UC Berkeley [SPARK-5654 
> <https://issues.apache.org/jira/browse/SPARK-5654>]. After my PhD I started 
> as a professor at UW-Madison and my contributions to SparkR have been in the 
> background given my availability. I continue to be involved in the community 
> and teach a popular course at UW-Madison which uses Apache Spark for 
> programming assignments. 
> 
> As the original contributor and author of a research paper on SparkR, I also 
> continue to get private emails from users. A common question I get is whether 
> one should use SparkR in Apache Spark or the sparklyr package (built on top 
> of Apache Spark). You can also see this in StackOverflow questions and other 
> blog posts online: https://www.google.com/search?q=sparkr+vs+sparklyr . 
> While, I have encouraged users to choose the SparkR package as it is 
> maintained by the Apache project, the more I looked into sparklyr, the more I 
> was convinced that it is a better choice for R users that want to leverage 
> the power of Spark:
> 
> (1) sparklyr is developed by a community of developers who understand the R 
> programming language deeply, and as a result is more idiomatic. In hindsight, 
> sparklyr’s more idiomatic approach would have been a better choice than the 
> Scala-like API we have in SparkR.
> 
> (2) Contributions to SparkR have decreased slowly. Over the last two years, 
> there have been 65 commits on the Spark R codebase (compared to ~2200 on the 
> Spark Python code base). In contrast Sparklyr has over 300 commits in the 
> same period..
> 
> (3) Previously, using and deploying sparklyr had been cumbersome as it needed 
> careful alignment of versions between Apache Spark and sparklyr. However, the 
> sparklyr community has implemented a new Spark Connect based architecture 
> which eliminates this issue.
> 
> (4) The sparklyr community has maintained their package on CRAN – it takes 
> some effort to do this as the CRAN release process requires passing a number 
> of tests. While SparkR was on CRAN initially, we could not maintain that 
> given our release process and cadence. This makes sparklyr much more 
> accessible to the R community.
> 
> So it is with a bittersweet feeling that I’m writing this email to propose 
> that we deprecate SparkR, and recommend sparklyr as the R language binding 
> for Spark. This will reduce complexity of our own codebase, and more 
> importantly reduce confusion for users. As the sparklyr package is 
> distributed using the same permissive license as Apache Spark, there should 
> be no downside for existing SparkR users in adopting it.
> 
> My proposal is to mark SparkR as deprecated in the upcoming Spark 4 release, 
> and remove it from Apache Spark with the following major release, Spark 5.
> 
> I’m looking forward to hearing your thoughts and feedback on this proposal 
> and I’m happy to create the SPIP ticket for a vote on this proposal using 
> this email thread as the justification.
> 
> Thanks
> Shivaram

Reply via email to