Hi,
Sorry for the delay but I finally got around to testing these queries with Hive
version 11.
Things are improved. Two of the three queries now run fine. However one query
still fails.
So this query runs fine:
SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON
b.rownumber = a.number;
But this one (which is _very_ similar)
SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON
b.rownumber = a.number;
fails with this error:
> SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber =
a.number;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
setting HADOOP_USER_NAME pmarron
13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property
hive.metastore.local no longer has any effect. Make sure to provide a valid
value for hive.metastore.uris if you are connecting to a remote metastore.
Execution log at: /tmp/pmarron/.log
2013-06-25 10:52:56 Starting to launch local task to process map join;
maximum memory = 932118528
java.lang.RuntimeException: cannot find field block__offset__inside__file from
[0:rownumber, 1:offset]
at
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366)
at
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
at
org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74)
at
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
at
org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
at
org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at
org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394)
at
org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-4
Logs:
/tmp/pmarron/hive.log
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapredLocalTask
There really doesn't seem to be anything helpful in the logs either.
It seems a little weird that it can find the virtual column in the first table,
but not the second.
Again, these are not blocking me. I'm just reporting these results as they may
expose a bug.
Regards,
Z
From: Ashutosh Chauhan [mailto:[email protected]]
Sent: 10 June 2013 16:48
To: [email protected]
Subject: Re: Use of virtual columns in joins
You might be hitting into https://issues.apache.org/jira/browse/HIVE-4033 in
which case its recommended that you upgrade to 0.11 where in this bug is fixed.
On Mon, Jun 10, 2013 at 1:57 AM, Peter Marron
<[email protected]<mailto:[email protected]>>
wrote:
Hi,
I'm using hive 0.10.0 over hadoop 1.0.4.
I have created a couple of test tables and found that various join queries
that refer to virtual columns fail. For example the query:
SELECT * FROM a JOIN b ON b.rownumber = a.number;
works but the following three queries all fail.
SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number;
SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number;
SELECT * FROM a JOIN b ON b.offset = a.BLOCK__OFFSET__INSIDE__FILE;
They all fail in the same way, but I am too much of a newb to be able to
tell much from the error message:
Error during job, obtaining debugging information...
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-1
Logs:
/tmp/pmarron/hive.log
When I look in the log I can find this:
2013-06-07 14:06:22,831 WARN mapred.LocalJobRunner
(LocalJobRunner.java:run(298)) - job_local_0001
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing writable 1,0
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing writable 1,0
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 4 more
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:516)
... 5 more
and I've looked at the source code referred to, but it doesn't mean much to me,
I'm afraid.
For completeness here's a description of the tables:
> select * from a;
OK
first 1 primo
second 2 secondo
third 3 terzo
fourth 4 quarto
fifth 5 quinto
sitxh 6 sesto
seventh 7 settimo
eigthth 8 ottavo
ninth 9 nono
tenth 10 decimo
Time taken: 0.105 seconds
hive> describe extended a;
OK
english string
number bigint
italian string
hive> select * from b;
OK
1 0
2 14
3 31
4 45
5 61
6 77
7 91
8 109
9 126
10 139
Time taken: 0.067 seconds
hive> describe b;
OK
rownumber bigint
offset bigint
Time taken: 0.072 seconds
hive>
These queries aren't actually important to me, as I am taking a different
approach.
But I thought that it might be important to mention these failures if they
expose
a bug. Or maybe I'll learn that I'm doing something and there's a way to get
these
joins to work...
Regards,
Z