[ https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027103#comment-16027103 ]
Sergey Shelukhin commented on HIVE-16761: ----------------------------------------- The call is in next {noformat} nextValue(batch.cols[i], rowInBatch, schema.get(i), getStructCol(value, i))) {noformat} Schema is created from the vrbCtx {noformat} schema = Lists.<TypeInfo>newArrayList(vrbCtx.getRowColumnTypeInfos()); {noformat} The ctx is the same one passed from the LlapReader... created via "LlapInputFormat.createFakeVrbCtx(mapWork);" for the non-vectorized map work case, as I assume is the case here. I suspect the problem is that the latter is incorrect for this case. {noformat} static VectorizedRowBatchCtx createFakeVrbCtx(MapWork mapWork) throws HiveException { // This is based on Vectorizer code, minus the validation. // Add all non-virtual columns from the TableScan operator. RowSchema rowSchema = findTsOp(mapWork).getSchema(); final List<String> colNames = new ArrayList<String>(rowSchema.getSignature().size()); final List<TypeInfo> colTypes = new ArrayList<TypeInfo>(rowSchema.getSignature().size()); for (ColumnInfo c : rowSchema.getSignature()) { String columnName = c.getInternalName(); if (VirtualColumn.VIRTUAL_COLUMN_NAMES.contains(columnName)) continue; colNames.add(columnName); colTypes.add(TypeInfoUtils.getTypeInfoFromTypeString(c.getTypeName())); } // Determine the partition columns using the first partition descriptor. // Note - like vectorizer, this assumes partition columns go after data columns. int partitionColumnCount = 0; Iterator<Path> paths = mapWork.getPathToAliases().keySet().iterator(); if (paths.hasNext()) { PartitionDesc partDesc = mapWork.getPathToPartitionInfo().get(paths.next()); if (partDesc != null) { LinkedHashMap<String, String> partSpec = partDesc.getPartSpec(); if (partSpec != null && partSpec.isEmpty()) { partitionColumnCount = partSpec.size(); } } } return new VectorizedRowBatchCtx(colNames.toArray(new String[colNames.size()]), colTypes.toArray(new TypeInfo[colTypes.size()]), null, partitionColumnCount, new String[0]); } {noformat} [~jdere] [~gopalv] does SMB join do something special wrt columns? Also, I see a bug right there with partition column count. I wonder if that could be related... > LLAP IO: SMB joins fail elevator > --------------------------------- > > Key: HIVE-16761 > URL: https://issues.apache.org/jira/browse/HIVE-16761 > Project: Hive > Issue Type: Bug > Reporter: Gopal V > > {code} > Caused by: java.io.IOException: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ... 26 more > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602) > at > org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149) > ... 28 more > {code} > {code} > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=500; > select year,quarter,count(*) from transactions_raw_orc_200 a join > customer_accounts_orc_200 b on a.account_id=b.account_id group by > year,quarter; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)