RE: Use of virtual columns in joins

Peter Marron Tue, 25 Jun 2013 02:57:44 -0700

Hi,

Sorry for the delay but I finally got around to testing these queries with Hive 
version 11.
Things are improved. Two of the three queries now run fine. However one query 
still fails.
So this query runs fine:


                SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON 
b.rownumber = a.number;
But this one (which is _very_ similar)

                SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON 
b.rownumber = a.number;
fails with this error:

    > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = 
a.number;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
setting HADOOP_USER_NAME        pmarron
13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property 
hive.metastore.local no longer has any effect. Make sure to provide a valid 
value for hive.metastore.uris if you are connecting to a remote metastore.
Execution log at: /tmp/pmarron/.log
2013-06-25 10:52:56     Starting to launch local task to process map join;      
maximum memory = 932118528
java.lang.RuntimeException: cannot find field block__offset__inside__file from 
[0:rownumber, 1:offset]
        at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366)
        at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
        at 
org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
        at 
org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
        at 
org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
        at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
        at 
org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394)
        at 
org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Execution failed with exit status: 2
Obtaining error information

Task failed!
Task ID:
  Stage-4

Logs:

/tmp/pmarron/hive.log
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapredLocalTask

There really doesn't seem to be anything helpful in the logs either.
It seems a little weird that it can find the virtual column in the first table, 
but not the second.
Again, these are not blocking me. I'm just reporting these results as they may 
expose a bug.

Regards,

Z

From: Ashutosh Chauhan [mailto:hashut...@apache.org]
Sent: 10 June 2013 16:48
To: user@hive.apache.org
Subject: Re: Use of virtual columns in joins

You might be hitting into https://issues.apache.org/jira/browse/HIVE-4033 in 
which case its recommended that you upgrade to 0.11 where in this bug is fixed.

On Mon, Jun 10, 2013 at 1:57 AM, Peter Marron 
<peter.mar...@trilliumsoftware.com<mailto:peter.mar...@trilliumsoftware.com>> 
wrote:
Hi,

I'm using hive 0.10.0 over hadoop 1.0.4.

I have created a couple of test tables and found that  various join queries
that refer to virtual columns fail. For example the query:

SELECT * FROM a JOIN b ON b.rownumber = a.number;

works but the following three queries all fail.

SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number;
SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = a.number;
SELECT * FROM a JOIN b ON b.offset = a.BLOCK__OFFSET__INSIDE__FILE;

They all fail in the same way, but I am too much of a newb to be able to
tell much from the error message:

Error during job, obtaining debugging information...
Execution failed with exit status: 2
Obtaining error information

Task failed!
Task ID:
  Stage-1

Logs:

/tmp/pmarron/hive.log

When I look in the log I can find this:

2013-06-07 14:06:22,831 WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(298)) - job_local_0001
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing writable 1,0
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing writable 1,0
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
        ... 4 more
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:516)
        ... 5 more

and I've looked at the source code referred to, but it doesn't mean much to me, 
I'm afraid.

For completeness here's a description of the tables:

    > select * from a;
OK
first        1              primo
second 2              secondo
third      3              terzo
fourth   4              quarto
fifth       5              quinto
sitxh      6              sesto
seventh               7              settimo
eigthth 8              ottavo
ninth     9              nono
tenth     10           decimo
Time taken: 0.105 seconds
hive> describe extended a;
OK
english  string
number                bigint
italian    string

hive> select * from b;
OK
1              0
2              14
3              31
4              45
5              61
6              77
7              91
8              109
9              126
10           139
Time taken: 0.067 seconds
hive> describe  b;
OK
rownumber        bigint
offset    bigint
Time taken: 0.072 seconds
hive>

These queries aren't actually important to me, as I am taking a different 
approach.
But I thought that it might be important to mention these failures if they 
expose
a bug. Or maybe I'll learn that I'm doing something and there's a way to get 
these
joins to work...

Regards,

Z

RE: Use of virtual columns in joins

Reply via email to