Hi, I have created a custom Hive UDF that has external JAR dependencies. I have added those jars to the Hive session using 'add jar' but when I try to create my function, I get a NoClassDefFoundError on the dependency class. I am on Hive 0.81 running in Amazon EMR.
This is what happens when I try to create my function: hive> add jar /home/hadoop/care/lib/wurfl-1.4.4.3.jar; Added /home/hadoop/care/lib/wurfl-1.4.4.3.jar to class path Added resource: /home/hadoop/care/lib/wurfl-1.4.4.3.jar hive> add jar /home/hadoop/care/my_udf.jar; Added /home/hadoop/care/my_udf.jar to class path Added resource: /home/hadoop/care/my_udf.jar hive> create temporary function is_bot as 'my.package.BotUDF'; java.lang.NoClassDefFoundError: net/sourceforge/wurfl/core/resource/WURFLResource at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105) at org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1127) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:307) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:228) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:457) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:732) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:187) Caused by: java.lang.ClassNotFoundException: net.sourceforge.wurfl.core.resource.WURFLResource at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 20 more FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask Then I validated that hive can actually find the class that it is failing on: hive> create temporary function is_bot as 'net.sourceforge.wurfl.core.resource.WURFLResource'; FAILED: Class net.sourceforge.wurfl.core.resource.WURFLResource does not implement UDF, GenericUDF, or UDAF FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask So it can see the class but doesn't see it as a UDF, which is right. Just to verify that Hive will crib with a different error if I specify a non-existent class. hive> create temporary function is_bot as 'non.existent.Class'; FAILED: Class non.existent.Class not found FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask I have not found any documentation that indicates that dependencies need to be specified using a different mechanism. https://issues.apache.org/jira/browse/HIVE-2561 mentions an alternate mechanism, but I can't find it on wiki or anywhere else. Any help or pointers are appreciated. Thanks Rupinder This email is intended for the person(s) to whom it is addressed and may contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, distribution, copying, or disclosure by any person other than the addressee(s) is strictly prohibited. If you have received this email in error, please notify the sender immediately by return email and delete the message and any attachments from your system.