[GitHub] incubator-zeppelin pull request: R Interpreter for Zeppelin

elbamos Thu, 25 Feb 2016 23:20:40 -0800

Github user elbamos commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/208#issuecomment-189144007
  
    I think it is "complete" and "working" -- the convenience binary is not 
part of the release, and a new one isn't going to be made until there is a new 
release anyway.  I think the preference is to simply get the working code into 
HEAD and then if there are remaining issues fix them on the path to the 0.6 
release.  But I certainly wouldn't object either way.
    
    Regarding the R directory - yes.  There's a light R package created here to 
handle the R side of scala-R communication.  I felt that it was better to put 
that in a local `R/` directory than to try to install it in the user's or 
system's R library.  That's the design choice made in Spark itself, so I tried 
to match that. 
    
    Regarding SPARK_HOME, yes and yes.  To talk to spark, R has to have a 
version of the SparkR R package that matches the running version of Spark.  
That lives under the `R/` subdirectory of the spark binary home, in the spark 
distribution.  So at startup, we need to know where the current spark binary 
is, so we can load the correct SparkR package.  (We could *try* to run without 
confirming the version, but in practice this generated support issues without a 
corresponding benefit.)  
    
    Why always SPARK_HOME and not to try to look for an R under a spark 
subdirectory of Zeppelin?  Because it turned out that doing that generated a 
torrent of support issues that arose when the user had an inconsistent Spark 
configuration or was running with a version of Spark other than the one they 
compiled against.  Part of the reason for this is that which spark installation 
gets used, actually is not completely defined by Zeppelin, and its changed more 
times than I think folks realize, including unintentionally.  (For example, 
compare the way the pyspark interpreter and SparkInterpreter try to locate 
Spark.)  Requiring that SPARK_HOME be set produces the only un-ambiguous 
configuration.  My understanding, from correspondence with Moon in 
August/September, is that spark-under-zeppelin is effectively deprecated, 
partly for this reason, and will soon be removed, so that Zeppelin always and 
only launches spark from SPARK_HOME.  So to simplify admin and the user 
experience, the inte
 rpreter requires SPARK_HOME, checks for it, and noisily declines to connect to 
spark if it isn't set. 
    
    So, to sum up -- there are a couple relevant `R/` directories.  One is part 
of the spark distribution and contains the SparkR package for connecting R to 
the correct version of Spark.  There is an analogous one created for the R 
interpreter, under ZEPPELIN_HOME, for the rzeppelin package, which handles the 
R side of the Zeppelin-R interface.  During startup, we force loading these 
packages from those directories so we load the correct versions.  
    
    @bzz Thank you for the reminder!  I'll take care of it in the next day or 
so.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: R Interpreter for Zeppelin

Reply via email to