Denis Efarov created ZEPPELIN-3591: -------------------------------------- Summary: Some values of "args" property in interpreter settings for Spark ruin UDF execution Key: ZEPPELIN-3591 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3591 Project: Zeppelin Issue Type: Bug Components: zeppelin-interpreter Affects Versions: 0.7.2 Environment: CentOS Linux 7.3.1611
Java 1.8.0_60 Scala 2.11.8 Spark 2.1.1 Hadoop 2.6.0 Zeppelin 0.7.2 Reporter: Denis Efarov In "args" interpreter configuration property, any value which starts with "-" (minus) sign prevents correct UDF execution in Spark running on YARN. Text after "-" doesn't matter, it fails anyway. All the other properties do not affect this. Steps to reproduce: * On the interpreter settings page, find Spark interpreter * For "args" property, put any value starting with "-", for example "-test" * Make sure spark starts on yarn (master=yarn-client) * Save settings and restart the interpreter * In any notebook, write and execute the following code: ** %spark val udfDemo = (i: Int) => i + 10; sqlContext.udf.register("demoUdf", (i: Int) => i); sqlContext.sql("select demoUdf(1) val").show Stacktrace: {{java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD}}{{at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)}}{{at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)}}{{at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)}}{{at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)}}{{at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)}}{{at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)}}{{at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)}}{{at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)}}{{at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)}}{{at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)}}{{at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)}}{{at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)}}{{at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)}}{{at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)}}{{at org.apache.spark.scheduler.Task.run(Task.scala:99)}}{{at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)}}{{at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)}}{{at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)}}{{at java.lang.Thread.run(Thread.java:744)}} Making the same UDF declaration in, for example, %pyspark interpreter, helps, even if one executes it in %spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005)