Hi Everyone, I'd like to start a discussion about possibility of adding magrittr (https://magrittr.tidyverse.org/) as an explicit dependency for SparkR. For those not familiar with the package, it provides a number small utilities where the most important one is %>% function, similar to pipe-forward (|>) in F# or thread-first macro (->) in Clojure. In other words, it allows us to replace:
df <- createDataFrame(iris)
df_filtered <- filter(df, df$Sepal_Width > df$Petal_Length)
df_projected <- select(df_filtered, min(df$Sepal_Width - df$Petal_Length))
or
df_projected <- select(
filter(createDataFrame(iris), column("Sepal_Width") >
column("Petal_Length")),
min(column("Sepal_Width") - column("Petal_Length"))
)
with
df_projected <- createDataFrame(iris) %>%
filter(.$Sepal_Width > .$Petal_Length) %>%
select(min(.$Sepal_Width - .$Petal_Length))
It is widely used (see reverse dependency section
https://cran.r-project.org/web/packages/magrittr/index.html), stable and
pretty much a core element of idiomatic R code these days.
Why we might want to add it:
* Improve readability of SparkR examples which, subjectively speaking,
can look a bit archaic.
* Reduce verbosity of SparkR codebase.
Possible risks:
* It is additional dependency for CI pipeline.
A: magrittr is already a transitive dependency for SparkR tests (it
is required by testthat), its API is extremely stable and itself
requires no dependencies.
* It is an additional dependency for SparkR installations.
A: Give widespread usage (over 1200 reverse imports, including some
of the most popular packages) it is probably of any, but minimal, R
installation.
While it's just anecdotal evidence, most of the SparkR applications
I've seen out there, already use magrittr.
Non-goals:
* Supporting non-standard evaluation.
Thanks in advance for your input.
--
Best regards,
Maciej Szymkiewicz
Web: https://zero323.net
Keybase: https://keybase.io/zero323
Gigs: https://www.codementor.io/@zero323
PGP: A30CEF0C31A501EC
signature.asc
Description: OpenPGP digital signature
