Hi, Sorry for the delay but I finally got around to testing these queries with Hive version 11. Things are improved. Two of the three queries now run fine. However one query still fails. So this query runs fine:
SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; But this one (which is _very_ similar) SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; fails with this error: > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; Automatically selecting local only mode for query Total MapReduce jobs = 1 setting HADOOP_USER_NAME pmarron 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore. Execution log at: /tmp/pmarron/.log 2013-06-25 10:52:56 Starting to launch local task to process map join; maximum memory = 932118528 java.lang.RuntimeException: cannot find field block__offset__inside__file from [0:rownumber, 1:offset] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168) at org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) at org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68) at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-4 Logs: /tmp/pmarron/hive.log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapredLocalTask There really doesn't seem to be anything helpful in the logs either. It seems a little weird that it can find the virtual column in the first table, but not the second. Again, these are not blocking me. I'm just reporting these results as they may expose a bug. Regards, Z From: Ashutosh Chauhan [mailto:hashut...@apache.org] Sent: 10 June 2013 16:48 To: user@hive.apache.org Subject: Re: Use of virtual columns in joins You might be hitting into https://issues.apache.org/jira/browse/HIVE-4033 in which case its recommended that you upgrade to 0.11 where in this bug is fixed. On Mon, Jun 10, 2013 at 1:57 AM, Peter Marron <peter.mar...@trilliumsoftware.com<mailto:peter.mar...@trilliumsoftware.com>> wrote: Hi, I'm using hive 0.10.0 over hadoop 1.0.4. I have created a couple of test tables and found that various join queries that refer to virtual columns fail. For example the query: SELECT * FROM a JOIN b ON b.rownumber = a.number; works but the following three queries all fail. SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number; SELECT * FROM a JOIN b ON b.offset = a.BLOCK__OFFSET__INSIDE__FILE; They all fail in the same way, but I am too much of a newb to be able to tell much from the error message: Error during job, obtaining debugging information... Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-1 Logs: /tmp/pmarron/hive.log When I look in the log I can find this: 2013-06-07 14:06:22,831 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(298)) - job_local_0001 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable 1,0 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable 1,0 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) ... 4 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:516) ... 5 more and I've looked at the source code referred to, but it doesn't mean much to me, I'm afraid. For completeness here's a description of the tables: > select * from a; OK first 1 primo second 2 secondo third 3 terzo fourth 4 quarto fifth 5 quinto sitxh 6 sesto seventh 7 settimo eigthth 8 ottavo ninth 9 nono tenth 10 decimo Time taken: 0.105 seconds hive> describe extended a; OK english string number bigint italian string hive> select * from b; OK 1 0 2 14 3 31 4 45 5 61 6 77 7 91 8 109 9 126 10 139 Time taken: 0.067 seconds hive> describe b; OK rownumber bigint offset bigint Time taken: 0.072 seconds hive> These queries aren't actually important to me, as I am taking a different approach. But I thought that it might be important to mention these failures if they expose a bug. Or maybe I'll learn that I'm doing something and there's a way to get these joins to work... Regards, Z