Thank you for your answer. The plans are identical for Hive 1.0.0 and Hive 1.1.0.
You're right, Hive-1.1.0 does not start a MapReduce job for the query, while Hive-1.0.0 does. Should I file a JIRA for this issue? 2015-05-07 21:17 GMT+02:00 Jason Dere <jd...@hortonworks.com>: > Is this on Hive CLI, or using HiveServer2? > > Can you run "explain select in_file('a', './testfile') from a;" from > both Hive 1.0.0 and hive 1.1.0 and see if they look different? > One possibile thing that might be happening here is that in Hive-1.1.0, > this query is being executed without the need for a map/reduce job, in > which case the working directory for the query is probably the local > working directory from when Hive was invoked. I don't think the Distributed > Cache will be working correctly in this case, because the UDF is not > running in a map/reduce task. > > If a map-reduce job is kicked off for the query and the UDF is running > in this m/r task environment, then the distributed cache will likely be > working fine. > > If there is a way to ensure the query with your UDF runs as part of a > map/reduce job this may do the trick. Adding an order-by will do it, but > maybe other people on this list may have a better way of making this happen. > > > > On May 7, 2015, at 3:28 AM, Zsolt Tóth <toth.zsolt....@gmail.com> wrote: > > Does this error occur for anyone else? It might be a serious issue. > > 2015-05-05 13:59 GMT+02:00 Zsolt Tóth <toth.zsolt....@gmail.com>: > >> Hi, >> >> I've just upgraded to Hive 1.1.0 and it looks like there is a problem >> with the distributed cache. >> I use ADD FILE, then an UDF that wants to read the file. The following >> syntax works in Hive 1.0.0 but Hive can't find the file in 1.1.0 (testfile >> exists on hdfs, the built-in udf in_file is just an example): >> >> add file hdfs:///tmp/testfile; >> select in_file('a', './testfile') from a; >> >> However, it works with the local path: >> >> select in_file('a', >> '/tmp/462e6854-10f3-4a68-a290-615e6e9d60ff_resources/testfile') from a; >> >> When I try to list the files in the directory "./" in Hive 1.1.0, it >> lists the cluster's root directory. It looks like the working directory >> changed in Hive 1.1.0. Is this intended? If so, how can I access the files >> in the distributed cache added with ADD FILE? >> >> Regards, >> Zsolt >> > > >