+1

On Tue, Aug 13, 2024 at 7:04 AM Nicholas Chammas <nicholas.cham...@gmail.com>
wrote:

> And just for the record, the stats that I screenshotted
> <https://lists.apache.org/api/email.lua?attachment=true&id=jd1hyq6c9v1qg0ym5qlct8lgcxk9yd6z&file=7a28ae0d6eb4c25e047ff90601a941f7acfc3214f837604b545b4f926b8eb628>
>  in
> that thread I linked to showed the following page views for each
> sub-section under `docs/latest/api/`:
>
> - python: 758K
> - java: 66K
> - sql: 39K
> - scala: 35K
> - r: <1K
>
> I don’t recall over what time period those stats were collected for, and
> there are certainly some factors of how the stats are gathered and how the
> various language API docs are accessed that impact those numbers. So it’s
> by no means a solid, objective measure. But I thought it was an interesting
> signal nonetheless.
>
>
> On Aug 12, 2024, at 5:50 PM, Nicholas Chammas <nicholas.cham...@gmail.com>
> wrote:
>
> Not an R user myself, but +1.
>
> I first wondered about the future of SparkR after noticing
> <https://lists.apache.org/thread/jd1hyq6c9v1qg0ym5qlct8lgcxk9yd6z> how
> low the visit stats were for the R API docs as compared to Python and
> Scala. (I can’t seem to find those visit stats
> <https://analytics.apache.org/index.php?module=CoreHome&action=index&date=today&period=month&idSite=40#?period=month&date=2024-07-02&idSite=40&category=General_Actions&subcategory=General_Pages>
>  for
> the API docs anymore.)
>
>
> On Aug 12, 2024, at 11:47 AM, Shivaram Venkataraman <
> shivaram.venkatara...@gmail.com> wrote:
>
> Hi
>
> About ten years ago, I created the original SparkR package as part of my
> research at UC Berkeley [SPARK-5654
> <https://issues.apache.org/jira/browse/SPARK-5654>]. After my PhD I
> started as a professor at UW-Madison and my contributions to SparkR have
> been in the background given my availability. I continue to be involved in
> the community and teach a popular course at UW-Madison which uses Apache
> Spark for programming assignments.
>
> As the original contributor and author of a research paper on SparkR, I
> also continue to get private emails from users. A common question I get is
> whether one should use SparkR in Apache Spark or the sparklyr package
> (built on top of Apache Spark). You can also see this in StackOverflow
> questions and other blog posts online:
> https://www.google.com/search?q=sparkr+vs+sparklyr . While, I have
> encouraged users to choose the SparkR package as it is maintained by the
> Apache project, the more I looked into sparklyr, the more I was convinced
> that it is a better choice for R users that want to leverage the power of
> Spark:
>
> (1) sparklyr is developed by a community of developers who understand the
> R programming language deeply, and as a result is more idiomatic. In
> hindsight, sparklyr’s more idiomatic approach would have been a better
> choice than the Scala-like API we have in SparkR.
>
> (2) Contributions to SparkR have decreased slowly. Over the last two
> years, there have been 65 commits on the Spark R codebase (compared to
> ~2200 on the Spark Python code base). In contrast Sparklyr has over 300
> commits in the same period..
>
> (3) Previously, using and deploying sparklyr had been cumbersome as it
> needed careful alignment of versions between Apache Spark and sparklyr.
> However, the sparklyr community has implemented a new Spark Connect based
> architecture which eliminates this issue.
>
> (4) The sparklyr community has maintained their package on CRAN – it takes
> some effort to do this as the CRAN release process requires passing a
> number of tests. While SparkR was on CRAN initially, we could not maintain
> that given our release process and cadence. This makes sparklyr much more
> accessible to the R community.
>
> So it is with a bittersweet feeling that I’m writing this email to propose
> that we deprecate SparkR, and recommend sparklyr as the R language binding
> for Spark. This will reduce complexity of our own codebase, and more
> importantly reduce confusion for users. As the sparklyr package is
> distributed using the same permissive license as Apache Spark, there should
> be no downside for existing SparkR users in adopting it.
>
> My proposal is to mark SparkR as deprecated in the upcoming Spark 4
> release, and remove it from Apache Spark with the following major release,
> Spark 5.
>
> I’m looking forward to hearing your thoughts and feedback on this proposal
> and I’m happy to create the SPIP ticket for a vote on this proposal using
> this email thread as the justification.
>
> Thanks
> Shivaram
>
>
>
>

Reply via email to