Hi,

I had a look at https://github.com/apache/incubator-zeppelin/pull/208 (and related Github repo https://github.com/elbamos/Zeppelin-With-R [1])

Here are a few topics for discussion based on my experience developing https://github.com/datalayer/zeppelin-R [2].

1. rscala jar not in Maven Repository

[1] copies the source (scala and R) code from rscala repo and changes/extends/repackages it a bit. [2] declares the jar as system scoped library. I recently had incompatibly issues between the 1.0.8 (the one you get since 2015-12-10 when you install rscala on your R environment) and the 1.0.6 jar I am using part of the zeppelin-R build. To avoid such issues, why not the user choosing the version via a property at build time to fit the version he runs on its host? This will also allow to benefit from the next rscala releases which fix bugs, bring not features... This also means we don't have to copy the rscala code in Zeppelin tree.

2. Interpreters

[1] proposes 2 interpreters %sparkr.r and %sparkr.knitr which are implemented in their own module apart from the Spark one. To be aligned the existing pyspark implementation, why not integrating the R code into the Spark one? Any reason to keep 2 versions which does basically the same? The unique magic keyword would then be %spark.r

3. Rendering TABLE plot when interpreter result is a dataframe

This may be confusing. What if I display a plot and simply want to print the first 10 rows at the end of my code? To keep the same behavior as the other interpreters, we could make this feature optional (disabled by default, enabled via property).


Thx, Eric

Reply via email to