Re: 回复： hive 0.11 auto convert join bug report

Steven Wong Wed, 25 Sep 2013 14:23:54 -0700

Sorry, I mean to put this stack trace instead.

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row [Error getting row data with exception
java.lang.ArrayIndexOutOfBoundsException: 385986740
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:180)
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
        at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
        at 
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
        at 
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
        at 
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:663)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:149)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
 ]
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:167)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row [Error getting row data with
exception java.lang.ArrayIndexOutOfBoundsException: 385986740
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:180)
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
        at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
        at 
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
        at 
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
        at 
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:663)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:149)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
 ]
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:149)
        ... 8 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 385986740
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:180)
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
        at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:234)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:652)
        ... 9 more




On Wed, Sep 25, 2013 at 2:16 PM, Steven Wong <sw...@netflix.com> wrote:

> For me, the bug exhibits itself in Hive 0.11 as the following stack trace.
> I'm putting it here so that people searching on a similar problem can find
> this discussion thread in a web search. The discussion thread contains a
> workaround and a patch.
>
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row (tag=0) [Error getting row data with 
> exception java.lang.ArrayIndexOutOfBoundsException: 175
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:287)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:188)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
>       at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
>       at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
>       at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:213)
>       at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:251)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:423)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Unknown Source)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>       at org.apache.hadoop.mapred.Child.main(Child.java:260)
>  ]
>       at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:423)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Unknown Source)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>       at org.apache.hadoop.mapred.Child.main(Child.java:260)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) [Error getting row data with exception 
> java.lang.ArrayIndexOutOfBoundsException: 175
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:287)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:188)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
>       at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
>       at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
>       at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:213)
>       at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:251)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:423)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Unknown Source)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>       at org.apache.hadoop.mapred.Child.main(Child.java:260)
>  ]
>       at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
>       ... 7 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 175
>       at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:131)
>       at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>       at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
>       ... 7 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 175
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:287)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:188)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:102)
>       at 
> org.apache.hadoop.hive.ql.exec.JoinUtil.computeValues(JoinUtil.java:243)
>       at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:82)
>       ... 9 more
>
>
>
> On Mon, Sep 16, 2013 at 5:20 AM, Sun, Rui <rui....@intel.com> wrote:
>
>>  Hi, Amit,****
>>
>> ** **
>>
>> You can see the description of HIVE-5256 for more detailed explanation.**
>> **
>>
>> ** **
>>
>> Both table aliases and names (if no alias) may run into this issue.****
>>
>> ** **
>>
>> This issue happened to be covered by the XML
>> serialization/deserialization of the MapredWork containing the join
>> operator (HashMap serialization/deserialization will reverse the order of
>> key-value pairs in the same bucket) and was exposed by HIVE-4078 because
>> the copy of Mapredwork in the case of noconditionaltask optimization was
>> optimized off. ****
>>
>> ** **
>>
>> ** **
>>
>> *From:* Amit Sharma [mailto:amsha...@netflix.com]
>> *Sent:* Friday, September 13, 2013 6:05 AM
>> *To:* user@hive.apache.org
>> *Subject:* Re: 回复： hive 0.11 auto convert join bug report****
>>
>> ** **
>>
>> Hi Navis,****
>>
>> ** **
>>
>> I was trying to look at this email thread as well as the jira to
>> understand the scope of this issue. Does this get triggered only in cases
>> of using aliases which end up mapping to the same value upon hashing? Or
>> can this be triggered under other conditions as well? What if the aliases
>> are not used and the table names some how might map to similar hashcode
>> values?****
>>
>> ** **
>>
>> Also is changing the alias the only workaround for this problem or is
>> there any other workaround possible?****
>>
>> ** **
>>
>> Thanks,
>> Amit****
>>
>> ** **
>>
>> On Sun, Aug 11, 2013 at 9:22 PM, Navis류승우 <navis....@nexr.com> wrote:****
>>
>> Hi,
>>
>> Hive is notorious making different result with different aliases.
>> Changing alias was a final way to avoid bug in desperate situation.
>>
>> I think the patch in the issue is ready, wish it's helpful.
>>
>> Thanks.
>>
>> 2013/8/11  <wzc1...@gmail.com>:****
>>
>> > Hi Navis,
>> >
>> > My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date'
>> are
>> > the same and the code in MapJoinProcessor.java ignores the order of
>> > rowschema.
>> > I look at your patch and it's exactly the same place we are working on.
>> > Thanks for your patch.
>> >
>> > 在 2013年8月11日星期日，下午9:38，Navis류승우 写道：
>> >
>> > Hi,
>> >
>> > I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
>> > and attached patch for it.
>> >
>> > It needs full test for confirmation but you can try it.
>> >
>> > Thanks.
>> >
>> > 2013/8/11 <wzc1...@gmail.com>:
>> >
>> > Hi all:
>> > when I change the table alias dim_pay_date to A, the query pass in hive
>> > 0.11(
>> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass
>> ):
>> >
>> > use test;
>> > create table if not exists src ( `key` int,`val` string);
>> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
>> overwrite
>> > into table src;
>> > drop table if exists orderpayment_small;
>> > create table orderpayment_small (`dealid` int,`date` string,`time`
>> string,
>> > `cityid` int, `userid` int);
>> > insert overwrite table orderpayment_small select 748, '2011-03-24',
>> > '2011-03-24', 55 ,5372613 from src limit 1;
>> > drop table if exists user_small;
>> > create table user_small( userid int);
>> > insert overwrite table user_small select key from src limit 100;
>> > set hive.auto.convert.join.noconditionaltask.size = 200;
>> > SELECT
>> > `A`.`date`
>> > , `deal`.`dealid`
>> > FROM `orderpayment_small` `orderpayment`
>> > JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
>> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
>> > `orderpayment`.`dealid`
>> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
>> > `orderpayment`.`cityid`
>> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
>> > limit 5;
>> >
>> >
>> > It's quite strange and interesting now. I will keep searching for the
>> answer
>> > to this issue.
>> >
>> >
>> >
>> > 在 2013年8月9日星期五，上午3:32，wzc1...@gmail.com 写道：
>> >
>> > Hi all:
>> > I'm currently testing hive11 and encounter one bug with
>> > hive.auto.convert.join, I construct a testcase so everyone can reproduce
>> > it(or you can reach the testcase
>> > here:
>> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>> >
>> > use test;
>> > create table src ( `key` int,`val` string);
>> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
>> overwrite
>> > into table src;
>> > drop table if exists orderpayment_small;
>> > create table orderpayment_small (`dealid` int,`date` string,`time`
>> string,
>> > `cityid` int, `userid` int);
>> > insert overwrite table orderpayment_small select 748, '2011-03-24',
>> > '2011-03-24', 55 ,5372613 from src limit 1;
>> > drop table if exists user_small;
>> > create table user_small( userid int);
>> > insert overwrite table user_small select key from src limit 100;
>> > set hive.auto.convert.join.noconditionaltask.size = 200;
>> > SELECT
>> > `dim_pay_date`.`date`
>> > , `deal`.`dealid`
>> > FROM `orderpayment_small` `orderpayment`
>> > JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
>> > `orderpayment`.`date`
>> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
>> > `orderpayment`.`dealid`
>> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
>> > `orderpayment`.`cityid`
>> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
>> > limit 5;
>> >
>> >
>> > You should replace the path of kv1.txt by yourself. You can run the
>> above
>> > query in hive 0.11 and it will fail with
>> ArrayIndexOutOfBoundsException, You
>> > can see the explain result and the console output of the query here :
>> > https://gist.github.com/code6/6187569
>> >
>> > I compile the trunk code but it doesn't work with this query. I can run
>> this
>> > query in hive 0.9 with hive.auto.convert.join turns on.
>> >
>> > I try to dig into this problem and I think it may be caused by the map
>> join
>> > optimization. Some adjacent operators aren't match for the input/output
>> > tableinfo(column positions diff).
>> >
>> > I'm not able to fix this bug and I would appreciate it if someone would
>> like
>> > to look into this problem.
>> >
>> > Thanks.
>> >
>> >****
>>
>> ** **
>>
>
>

Re: 回复： hive 0.11 auto convert join bug report

Reply via email to