Vitaly Polonetsky created ZEPPELIN-1518:
-------------------------------------------

             Summary: Lambda expressions are not working on CDH 2.7x Spark
                 Key: ZEPPELIN-1518
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1518
             Project: Zeppelin
          Issue Type: Bug
    Affects Versions: 0.6.1, 0.6.0
            Reporter: Vitaly Polonetsky


CDH 5.7.x backported RpcEnv and eliminated the class server in Spark 1.6.0 REPL:
https://github.com/cloudera/spark/commit/e0d03eb30e03f589407c3cf37317a64f18db8257

An attempted fix was performed:
https://github.com/apache/zeppelin/commit/78c7b5567e7fb4985cecf147c39033c554dfc208

Although you can do basic spark operations in zeppelin after this fix, the 
following code is now failing:
{quote}
val rdd2 = sc.parallelize(Seq(1,2,3,4,5))
rdd2.filter(_ > 3).count()
{quote}

The lambda expression is not being transferred to the executors:
{{java.lang.ClassNotFoundException: 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1}}

As far as I understand Zeppelin supports the RpcEnv for Spark 2.11 only by 
using the {{-Yrepl-outdir}} option that is not supported in Spark 2.10

Another way of supporting RpcEnv could be using the spark-submit way of 
accessing the new classes through the Rpc. Here's what I've hacked and have it 
working locally, but I'm having trouble testing my pull request:
1. In {{SparkInterpreter.createSparkContext_1()}} if {{classServerUri}} is null 
after both checks, try to invoke the {{intp.getClassOutputDirectory()}} using 
reflection
2. Use the returned value to set sparkConf's {{spark.repl.class.outputDir}} 
param

The same method could be used for Spark 2.0 as well, eliminating additional 
http server running inside zeppelin for providing lambda classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to