Thanks you for your quick answer. It helped a lot and I managed to create and 
use the UDF. 

I would like to add my conclusions to the process:

Inside the UDF, one should call the File f = new File (filename) from the 
evaluate() method and not from the initialize() method to ensure all mappers 
access to the distributed cache. The filename should be the name of the file 
prefixed with "./" like this code example: String filename = "./example.xml "

There's an old JIRA open issue regarding the ability to access the distributed 
cache from UDFs - https://issues.apache.org/jira/browse/HIVE-1016 looks like no 
process has been done lately on  this issue, maybe Carl could update us 
otherwise or prioritize higher this issue.

Thanks again for your help Edward, this issue was really important to me.
Maoz

-----Original Message-----
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Tuesday, May 29, 2012 4:58 PM
To: user@hive.apache.org
Subject: Re: GenericUdf and Jdbc issues

So.....
this.getResourceAsStream(filename) is a very tricky method to get right 
especially in hive which you have the hive-classpath, the hadoop-classpath, the 
hive-jdbc classpath. Especially when you consider that launched map/reduce 
tasks get there own environment and classpath.

I had the same issues when I was writing my geo-ip-udf. See the comments.

https://github.com/edwardcapriolo/hive-geoip/blob/master/src/main/java/com/jointhegrid/udf/geoip/GenericUDFGeoIP.java

I came to the conclusion that if you add a file to the distributed cache using 
'ADD FILE'
You can reliably assume it will be in the current working directory and this 
works.

        File f = new File(database);

I hope this helps.
Edward

On Tue, May 29, 2012 at 8:35 AM, Maoz Gelbart <maoz.gelb...@pursway.com> wrote:
> Hi all,
>
>
>
> I am using Hive 0.7.1 over Cloudera's Hadoop distribution 0.20.2 and 
> MapR hdfs distribution 1.1.1.
>
> I wrote a GenericUDF packaged as a Jar that attempts to open a local 
> resource during its initialization at initialize(ObjectInspector[]
> arguments) command.
>
>
>
> When I run with the CLI, everything is fine.
>
> When I run using Cloudera's Hive-JDBC driver, The UDF fails with null 
> pointer returned from the command this.getResourceAsStream(filename).
>
> Removing the line fixed the problem and the UDF ran on both CLI and 
> Jdbc, so I believe that "ADD JAR" and "CREATE TEMPORARY FUNCTION" were 
> entered correctly.
>
>
>
> Did anyone observe such a behavior? I have a demo Jar to reproduce the 
> problem if needed.
>
>
>
> Thanks,
>
> Maoz

Reply via email to