Zsolt Tóth created HIVE-12045:
---------------------------------

             Summary: ClassNotFound for GenericUDF in "select distinct..." 
queries (Hive on Spark)
                 Key: HIVE-12045
                 URL: https://issues.apache.org/jira/browse/HIVE-12045
             Project: Hive
          Issue Type: Bug
          Components: Spark
         Environment: Cloudera QuickStart VM, beeline
            Reporter: Zsolt Tóth


If I execute the following query in beeline, I get ClassNotFoundException for 
the UDF class.
{code}
drop function myGenericUdf;
create function myGenericUdf as 'org.example.myGenericUdf' using jar 
'hdfs:///tmp/myudf.jar';
select distinct myGenericUdf(1,2,1) from mytable;
{code}
In my example, myGenericUdf just looks for the 1st argument's value in the 
others and returns the index. I don't think this is related to the actual 
GenericUDF function.

Note that:
"select myGenericUdf(1,2,1) from mytable;" succeeds
If I use the non-generic implementation of the same UDF, the select distinct 
call succeeds.

StackTrace:
{code}
15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: 
org.example.myGenericUDF
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: 
org.example.myGenericUDF
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
        at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
        at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
        at 
org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1069)
        at 
org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:960)
        at 
org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:974)
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:416)
        at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:296)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
        at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:505)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
        at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
        at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82)
        at 
org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
        at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:206)
        at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:204)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.dependencies(RDD.scala:204)
        at 
org.apache.spark.scheduler.DAGScheduler.visit$2(DAGScheduler.scala:338)
        at 
org.apache.spark.scheduler.DAGScheduler.getAncestorShuffleDependencies(DAGScheduler.scala:355)
        at 
org.apache.spark.scheduler.DAGScheduler.registerShuffleDependencies(DAGScheduler.scala:317)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getShuffleMapStage(DAGScheduler.scala:218)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$visit$1$1.apply(DAGScheduler.scala:301)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$visit$1$1.apply(DAGScheduler.scala:298)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:298)
        at 
org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:310)
        at 
org.apache.spark.scheduler.DAGScheduler.newStage(DAGScheduler.scala:244)
        at 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:731)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Caused by: java.lang.ClassNotFoundException: org.example.myGenericUDF
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
        ... 72 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to