Github user elbamos commented on the pull request:
https://github.com/apache/incubator-zeppelin/pull/208#issuecomment-189144007
I think it is "complete" and "working" -- the convenience binary is not
part of the release, and a new one isn't going to be made until there is a new
release anyway. I think the preference is to simply get the working code into
HEAD and then if there are remaining issues fix them on the path to the 0.6
release. But I certainly wouldn't object either way.
Regarding the R directory - yes. There's a light R package created here to
handle the R side of scala-R communication. I felt that it was better to put
that in a local `R/` directory than to try to install it in the user's or
system's R library. That's the design choice made in Spark itself, so I tried
to match that.
Regarding SPARK_HOME, yes and yes. To talk to spark, R has to have a
version of the SparkR R package that matches the running version of Spark.
That lives under the `R/` subdirectory of the spark binary home, in the spark
distribution. So at startup, we need to know where the current spark binary
is, so we can load the correct SparkR package. (We could *try* to run without
confirming the version, but in practice this generated support issues without a
corresponding benefit.)
Why always SPARK_HOME and not to try to look for an R under a spark
subdirectory of Zeppelin? Because it turned out that doing that generated a
torrent of support issues that arose when the user had an inconsistent Spark
configuration or was running with a version of Spark other than the one they
compiled against. Part of the reason for this is that which spark installation
gets used, actually is not completely defined by Zeppelin, and its changed more
times than I think folks realize, including unintentionally. (For example,
compare the way the pyspark interpreter and SparkInterpreter try to locate
Spark.) Requiring that SPARK_HOME be set produces the only un-ambiguous
configuration. My understanding, from correspondence with Moon in
August/September, is that spark-under-zeppelin is effectively deprecated,
partly for this reason, and will soon be removed, so that Zeppelin always and
only launches spark from SPARK_HOME. So to simplify admin and the user
experience, the inte
rpreter requires SPARK_HOME, checks for it, and noisily declines to connect to
spark if it isn't set.
So, to sum up -- there are a couple relevant `R/` directories. One is part
of the spark distribution and contains the SparkR package for connecting R to
the correct version of Spark. There is an analogous one created for the R
interpreter, under ZEPPELIN_HOME, for the rzeppelin package, which handles the
R side of the Zeppelin-R interface. During startup, we force loading these
packages from those directories so we load the correct versions.
@bzz Thank you for the reminder! I'll take care of it in the next day or
so.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---