Vitaly Polonetsky created ZEPPELIN-1518: -------------------------------------------
Summary: Lambda expressions are not working on CDH 2.7x Spark Key: ZEPPELIN-1518 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1518 Project: Zeppelin Issue Type: Bug Affects Versions: 0.6.1, 0.6.0 Reporter: Vitaly Polonetsky CDH 5.7.x backported RpcEnv and eliminated the class server in Spark 1.6.0 REPL: https://github.com/cloudera/spark/commit/e0d03eb30e03f589407c3cf37317a64f18db8257 An attempted fix was performed: https://github.com/apache/zeppelin/commit/78c7b5567e7fb4985cecf147c39033c554dfc208 Although you can do basic spark operations in zeppelin after this fix, the following code is now failing: {quote} val rdd2 = sc.parallelize(Seq(1,2,3,4,5)) rdd2.filter(_ > 3).count() {quote} The lambda expression is not being transferred to the executors: {{java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1}} As far as I understand Zeppelin supports the RpcEnv for Spark 2.11 only by using the {{-Yrepl-outdir}} option that is not supported in Spark 2.10 Another way of supporting RpcEnv could be using the spark-submit way of accessing the new classes through the Rpc. Here's what I've hacked and have it working locally, but I'm having trouble testing my pull request: 1. In {{SparkInterpreter.createSparkContext_1()}} if {{classServerUri}} is null after both checks, try to invoke the {{intp.getClassOutputDirectory()}} using reflection 2. Use the returned value to set sparkConf's {{spark.repl.class.outputDir}} param The same method could be used for Spark 2.0 as well, eliminating additional http server running inside zeppelin for providing lambda classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)