Yes, it's a bug. I've booked on https://issues.apache.org/jira/browse/HIVE-4790.
2013/6/25 Peter Marron <peter.mar...@trilliumsoftware.com>: > Hi, > > > > Sorry for the delay but I finally got around to testing these queries with > Hive version 11. > > Things are improved. Two of the three queries now run fine. However one > query still fails. > > So this query runs fine: > > > > SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON > b.rownumber = a.number; > > But this one (which is _very_ similar) > > > > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON > b.rownumber = a.number; > > fails with this error: > > > > > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = > a.number; > > Automatically selecting local only mode for query > > Total MapReduce jobs = 1 > > setting HADOOP_USER_NAME pmarron > > 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property > hive.metastore.local no longer has any effect. Make sure to provide a valid > value for hive.metastore.uris if you are connecting to a remote metastore. > > Execution log at: /tmp/pmarron/.log > > 2013-06-25 10:52:56 Starting to launch local task to process map join; > maximum memory = 932118528 > > java.lang.RuntimeException: cannot find field block__offset__inside__file > from [0:rownumber, 1:offset] > > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) > > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168) > > at > org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74) > > at > org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) > > at > org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68) > > at > org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222) > > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) > > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) > > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) > > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186) > > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) > > at > org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394) > > at > org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277) > > at > org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > Execution failed with exit status: 2 > > Obtaining error information > > > > Task failed! > > Task ID: > > Stage-4 > > > > Logs: > > > > /tmp/pmarron/hive.log > > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapredLocalTask > > > > There really doesn’t seem to be anything helpful in the logs either. > > It seems a little weird that it can find the virtual column in the first > table, but not the second. > > Again, these are not blocking me. I’m just reporting these results as they > may expose a bug. > > > > Regards, > > > > Z > > > > From: Ashutosh Chauhan [mailto:hashut...@apache.org] > Sent: 10 June 2013 16:48 > To: user@hive.apache.org > Subject: Re: Use of virtual columns in joins > > > > You might be hitting into https://issues.apache.org/jira/browse/HIVE-4033 in > which case its recommended that you upgrade to 0.11 where in this bug is > fixed. > > > > On Mon, Jun 10, 2013 at 1:57 AM, Peter Marron > <peter.mar...@trilliumsoftware.com> wrote: > > Hi, > > > > I’m using hive 0.10.0 over hadoop 1.0.4. > > > > I have created a couple of test tables and found that various join queries > > that refer to virtual columns fail. For example the query: > > > > SELECT * FROM a JOIN b ON b.rownumber = a.number; > > > > works but the following three queries all fail. > > > > SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = > a.number; > > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = > a.number; > > SELECT * FROM a JOIN b ON b.offset = a.BLOCK__OFFSET__INSIDE__FILE; > > > > They all fail in the same way, but I am too much of a newb to be able to > > tell much from the error message: > > > > Error during job, obtaining debugging information... > > Execution failed with exit status: 2 > > Obtaining error information > > > > Task failed! > > Task ID: > > Stage-1 > > > > Logs: > > > > /tmp/pmarron/hive.log > > > > When I look in the log I can find this: > > > > 2013-06-07 14:06:22,831 WARN mapred.LocalJobRunner > (LocalJobRunner.java:run(298)) - job_local_0001 > > java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing writable 1,0 > > at > org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing writable 1,0 > > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539) > > at > org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) > > ... 4 more > > Caused by: java.lang.NullPointerException > > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:516) > > ... 5 more > > > > and I’ve looked at the source code referred to, but it doesn’t mean much to > me, I’m afraid. > > > > For completeness here’s a description of the tables: > > > > > select * from a; > > OK > > first 1 primo > > second 2 secondo > > third 3 terzo > > fourth 4 quarto > > fifth 5 quinto > > sitxh 6 sesto > > seventh 7 settimo > > eigthth 8 ottavo > > ninth 9 nono > > tenth 10 decimo > > Time taken: 0.105 seconds > > hive> describe extended a; > > OK > > english string > > number bigint > > italian string > > > > hive> select * from b; > > OK > > 1 0 > > 2 14 > > 3 31 > > 4 45 > > 5 61 > > 6 77 > > 7 91 > > 8 109 > > 9 126 > > 10 139 > > Time taken: 0.067 seconds > > hive> describe b; > > OK > > rownumber bigint > > offset bigint > > Time taken: 0.072 seconds > > hive> > > > > These queries aren’t actually important to me, as I am taking a different > approach. > > But I thought that it might be important to mention these failures if they > expose > > a bug. Or maybe I’ll learn that I’m doing something and there’s a way to get > these > > joins to work… > > > > Regards, > > > > Z > > > >