So..... this.getResourceAsStream(filename) is a very tricky method to get right especially in hive which you have the hive-classpath, the hadoop-classpath, the hive-jdbc classpath. Especially when you consider that launched map/reduce tasks get there own environment and classpath.
I had the same issues when I was writing my geo-ip-udf. See the comments. https://github.com/edwardcapriolo/hive-geoip/blob/master/src/main/java/com/jointhegrid/udf/geoip/GenericUDFGeoIP.java I came to the conclusion that if you add a file to the distributed cache using 'ADD FILE' You can reliably assume it will be in the current working directory and this works. File f = new File(database); I hope this helps. Edward On Tue, May 29, 2012 at 8:35 AM, Maoz Gelbart <maoz.gelb...@pursway.com> wrote: > Hi all, > > > > I am using Hive 0.7.1 over Cloudera’s Hadoop distribution 0.20.2 and MapR > hdfs distribution 1.1.1. > > I wrote a GenericUDF packaged as a Jar that attempts to open a local > resource during its initialization at initialize(ObjectInspector[] > arguments) command. > > > > When I run with the CLI, everything is fine. > > When I run using Cloudera’s Hive-JDBC driver, The UDF fails with null > pointer returned from the command this.getResourceAsStream(filename). > > Removing the line fixed the problem and the UDF ran on both CLI and Jdbc, so > I believe that “ADD JAR” and “CREATE TEMPORARY FUNCTION” were entered > correctly. > > > > Did anyone observe such a behavior? I have a demo Jar to reproduce the > problem if needed. > > > > Thanks, > > Maoz