Denis Efarov created ZEPPELIN-3591:
--------------------------------------

             Summary: Some values of "args" property in interpreter settings 
for Spark ruin UDF execution
                 Key: ZEPPELIN-3591
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3591
             Project: Zeppelin
          Issue Type: Bug
          Components: zeppelin-interpreter
    Affects Versions: 0.7.2
         Environment: CentOS Linux 7.3.1611

Java 1.8.0_60

Scala 2.11.8

Spark 2.1.1

Hadoop 2.6.0

Zeppelin 0.7.2

 

 
            Reporter: Denis Efarov


In "args" interpreter configuration property, any value which starts with "-" 
(minus) sign prevents correct UDF execution in Spark running on YARN. Text 
after "-" doesn't matter, it fails anyway. All the other properties do not 
affect this.

Steps to reproduce: 
 * On the interpreter settings page, find Spark interpreter
 * For "args" property, put any value starting with "-", for example "-test" 
 * Make sure spark starts on yarn (master=yarn-client)
 * Save settings and restart the interpreter
 * In any notebook, write and execute the following code:
 ** %spark
val udfDemo = (i: Int) => i + 10;
sqlContext.udf.register("demoUdf", (i: Int) => i);
sqlContext.sql("select demoUdf(1) val").show

Stacktrace:

{{java.lang.ClassCastException: cannot assign instance of 
scala.collection.immutable.List$SerializationProxy to field 
org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type 
scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD}}{{at 
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)}}{{at
 java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)}}{{at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)}}{{at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)}}{{at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)}}{{at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)}}{{at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)}}{{at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)}}{{at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)}}{{at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)}}{{at 
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)}}{{at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)}}{{at
 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)}}{{at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)}}{{at 
org.apache.spark.scheduler.Task.run(Task.scala:99)}}{{at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)}}{{at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)}}{{at
 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)}}{{at
 java.lang.Thread.run(Thread.java:744)}}

Making the same UDF declaration in, for example, %pyspark interpreter, helps, 
even if one executes it in %spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to