Exception in Hive with SMB join and Parquet

Suma Shivaprasad Wed, 30 Jul 2014 06:14:55 -0700

Am using 0.13.0 version of hive with parquet table having 34 columns
with the following props while creating the table



*CLUSTERED BY (udid)  SORTED BY (udid ASC) INTO 256 BUCKETS
STORED as PARQUET
TBLPROPERTIES ("parquet.compression"="SNAPPY"); *

The query I am running is


*set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set hive.mapjoin.smalltable.filesize=200000000;
set hive.vectorized.execution.enabled = true;*


*set hive.stats.fetch.column.stats=true;
set hive.stats.collect.tablekeys=true;
set hive.stats.reliable=true;*

*select sum(rev),sum(adimp)
 from user_rr_parq rr join user_domain_parq dm on rr.udid = dm.id <http://dm.id>
 where dt = '..' and hour='..'
 and dm.age.source = '..'
 and dm.age.id <http://dm.age.id> IN ('..')
 group by rr.udid;*


with both user_rr_parq and user_domain_parq both clustered and sorted
by same join key

*Exception in Mapper logs*

2014-07-30 12:44:08,577 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1

2014-07-30 12:44:08,579 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row
{"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"cpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row
{"udid":"+HkGEOKZopHELKtUDdJzOUPr5yuSHxTHN5iknyzNSjE=","optout":null,"uage":null,"ugender":null,"siteid":null,"handsetid":null,"intversion":null,"intmethod":null,"intfamily":null,"intdirect":null,"intorigin":null,"advid":null,"campgnid":null,"adgrpidbig":null,"ccid":null,"locsrc":null,"adid":null,"adidbig":null,"market":null,"nfr":null,"uidparams":null,"time":null,"disc_uidparams":null,"vldclk":null,"fraudclk":null,"totalburn":null,"pubcpc":null,"agencycpc":null,"rev":0.0,"adimp":0,"pgimp":null,"mkvalidadreq":null,"mkvalidpgreq":null,"map_uid":null,"dt":"2014-06-01","hour":"00"}
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
        ... 8 more*Caused by:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
        at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:773)
        at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.setupContext(SMBMapJoinOperator.java:710)
        at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.setUpFetchContexts(SMBMapJoinOperator.java:538)
        at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:248)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
        ... 9 more*Caused by: java.io.IOException:
java.lang.IndexOutOfBoundsException: Index: 29, Size: 5*
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:636)
        at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.next(SMBMapJoinOperator.java:794)
        at 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator$MergeQueue.nextHive(SMBMapJoinOperator.java:771)
        ... 16 more*Caused by: java.lang.IndexOutOfBoundsException: Index: 29, 
Size: 5*
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
        at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:96)
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
        at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:471)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
        ... 18 more

Looks like it is trying to access the column with index as 29 where as
there are only 5 non null columns being present in the row - which
matches the Arraylist size.

What could be going wrong here?


Thanks

Suma

Exception in Hive with SMB join and Parquet

Reply via email to