Thanks you for your quick answer. It helped a lot and I managed to create and use the UDF.
I would like to add my conclusions to the process: Inside the UDF, one should call the File f = new File (filename) from the evaluate() method and not from the initialize() method to ensure all mappers access to the distributed cache. The filename should be the name of the file prefixed with "./" like this code example: String filename = "./example.xml " There's an old JIRA open issue regarding the ability to access the distributed cache from UDFs - https://issues.apache.org/jira/browse/HIVE-1016 looks like no process has been done lately on this issue, maybe Carl could update us otherwise or prioritize higher this issue. Thanks again for your help Edward, this issue was really important to me. Maoz -----Original Message----- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Tuesday, May 29, 2012 4:58 PM To: user@hive.apache.org Subject: Re: GenericUdf and Jdbc issues So..... this.getResourceAsStream(filename) is a very tricky method to get right especially in hive which you have the hive-classpath, the hadoop-classpath, the hive-jdbc classpath. Especially when you consider that launched map/reduce tasks get there own environment and classpath. I had the same issues when I was writing my geo-ip-udf. See the comments. https://github.com/edwardcapriolo/hive-geoip/blob/master/src/main/java/com/jointhegrid/udf/geoip/GenericUDFGeoIP.java I came to the conclusion that if you add a file to the distributed cache using 'ADD FILE' You can reliably assume it will be in the current working directory and this works. File f = new File(database); I hope this helps. Edward On Tue, May 29, 2012 at 8:35 AM, Maoz Gelbart <maoz.gelb...@pursway.com> wrote: > Hi all, > > > > I am using Hive 0.7.1 over Cloudera's Hadoop distribution 0.20.2 and > MapR hdfs distribution 1.1.1. > > I wrote a GenericUDF packaged as a Jar that attempts to open a local > resource during its initialization at initialize(ObjectInspector[] > arguments) command. > > > > When I run with the CLI, everything is fine. > > When I run using Cloudera's Hive-JDBC driver, The UDF fails with null > pointer returned from the command this.getResourceAsStream(filename). > > Removing the line fixed the problem and the UDF ran on both CLI and > Jdbc, so I believe that "ADD JAR" and "CREATE TEMPORARY FUNCTION" were > entered correctly. > > > > Did anyone observe such a behavior? I have a demo Jar to reproduce the > problem if needed. > > > > Thanks, > > Maoz