Re: Use of virtual columns in joins

Navis류승우 Wed, 26 Jun 2013 00:11:58 -0700

Yes, it's a bug. I've booked on https://issues.apache.org/jira/browse/HIVE-4790.


2013/6/25 Peter Marron <peter.mar...@trilliumsoftware.com>:
> Hi,
>
>
>
> Sorry for the delay but I finally got around to testing these queries with
> Hive version 11.
>
> Things are improved. Two of the three queries now run fine. However one
> query still fails.
>
> So this query runs fine:
>
>
>
>                 SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON
> b.rownumber = a.number;
>
> But this one (which is _very_ similar)
>
>
>
>                 SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON
> b.rownumber = a.number;
>
> fails with this error:
>
>
>
>     > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber =
> a.number;
>
> Automatically selecting local only mode for query
>
> Total MapReduce jobs = 1
>
> setting HADOOP_USER_NAME        pmarron
>
> 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property
> hive.metastore.local no longer has any effect. Make sure to provide a valid
> value for hive.metastore.uris if you are connecting to a remote metastore.
>
> Execution log at: /tmp/pmarron/.log
>
> 2013-06-25 10:52:56     Starting to launch local task to process map join;
> maximum memory = 932118528
>
> java.lang.RuntimeException: cannot find field block__offset__inside__file
> from [0:rownumber, 1:offset]
>
>         at
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366)
>
>         at
> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
>
>         at
> org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74)
>
>         at
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
>
>         at
> org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
>
>         at
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222)
>
>         at
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>
>         at
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
>
>         at
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
>
>         at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
>
>         at
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>
>         at
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394)
>
>         at
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277)
>
>         at
> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>         at java.lang.reflect.Method.invoke(Method.java:597)
>
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> Execution failed with exit status: 2
>
> Obtaining error information
>
>
>
> Task failed!
>
> Task ID:
>
>   Stage-4
>
>
>
> Logs:
>
>
>
> /tmp/pmarron/hive.log
>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapredLocalTask
>
>
>
> There really doesn’t seem to be anything helpful in the logs either.
>
> It seems a little weird that it can find the virtual column in the first
> table, but not the second.
>
> Again, these are not blocking me. I’m just reporting these results as they
> may expose a bug.
>
>
>
> Regards,
>
>
>
> Z
>
>
>
> From: Ashutosh Chauhan [mailto:hashut...@apache.org]
> Sent: 10 June 2013 16:48
> To: user@hive.apache.org
> Subject: Re: Use of virtual columns in joins
>
>
>
> You might be hitting into https://issues.apache.org/jira/browse/HIVE-4033 in
> which case its recommended that you upgrade to 0.11 where in this bug is
> fixed.
>
>
>
> On Mon, Jun 10, 2013 at 1:57 AM, Peter Marron
> <peter.mar...@trilliumsoftware.com> wrote:
>
> Hi,
>
>
>
> I’m using hive 0.10.0 over hadoop 1.0.4.
>
>
>
> I have created a couple of test tables and found that  various join queries
>
> that refer to virtual columns fail. For example the query:
>
>
>
> SELECT * FROM a JOIN b ON b.rownumber = a.number;
>
>
>
> works but the following three queries all fail.
>
>
>
> SELECT *,a.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber =
> a.number;
>
> SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber =
> a.number;
>
> SELECT * FROM a JOIN b ON b.offset = a.BLOCK__OFFSET__INSIDE__FILE;
>
>
>
> They all fail in the same way, but I am too much of a newb to be able to
>
> tell much from the error message:
>
>
>
> Error during job, obtaining debugging information...
>
> Execution failed with exit status: 2
>
> Obtaining error information
>
>
>
> Task failed!
>
> Task ID:
>
>   Stage-1
>
>
>
> Logs:
>
>
>
> /tmp/pmarron/hive.log
>
>
>
> When I look in the log I can find this:
>
>
>
> 2013-06-07 14:06:22,831 WARN  mapred.LocalJobRunner
> (LocalJobRunner.java:run(298)) - job_local_0001
>
> java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing writable 1,0
>
>         at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
>
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing writable 1,0
>
>         at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
>
>         at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
>
>         ... 4 more
>
> Caused by: java.lang.NullPointerException
>
>         at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:516)
>
>         ... 5 more
>
>
>
> and I’ve looked at the source code referred to, but it doesn’t mean much to
> me, I’m afraid.
>
>
>
> For completeness here’s a description of the tables:
>
>
>
>     > select * from a;
>
> OK
>
> first        1              primo
>
> second 2              secondo
>
> third      3              terzo
>
> fourth   4              quarto
>
> fifth       5              quinto
>
> sitxh      6              sesto
>
> seventh               7              settimo
>
> eigthth 8              ottavo
>
> ninth     9              nono
>
> tenth     10           decimo
>
> Time taken: 0.105 seconds
>
> hive> describe extended a;
>
> OK
>
> english  string
>
> number                bigint
>
> italian    string
>
>
>
> hive> select * from b;
>
> OK
>
> 1              0
>
> 2              14
>
> 3              31
>
> 4              45
>
> 5              61
>
> 6              77
>
> 7              91
>
> 8              109
>
> 9              126
>
> 10           139
>
> Time taken: 0.067 seconds
>
> hive> describe  b;
>
> OK
>
> rownumber        bigint
>
> offset    bigint
>
> Time taken: 0.072 seconds
>
> hive>
>
>
>
> These queries aren’t actually important to me, as I am taking a different
> approach.
>
> But I thought that it might be important to mention these failures if they
> expose
>
> a bug. Or maybe I’ll learn that I’m doing something and there’s a way to get
> these
>
> joins to work…
>
>
>
> Regards,
>
>
>
> Z
>
>
>
>

Re: Use of virtual columns in joins

Reply via email to