[ https://issues.apache.org/jira/browse/HIVE-22438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenning Ding reassigned HIVE-22438: ----------------------------------- > Additional comma is added to projection column ids > -------------------------------------------------- > > Key: HIVE-22438 > URL: https://issues.apache.org/jira/browse/HIVE-22438 > Project: Hive > Issue Type: Bug > Reporter: Wenning Ding > Assignee: Wenning Ding > Priority: Major > > I ran into this issue when querying a Hudi data through Hive. > Basically, to query a Hudi style table, Hudi implements its own InputFormat > class and overwrite the getRecordReader method. In this method, because of > some reasons, Hudi will manually add several projection column ids and > projection column names when each time getRecordReader method is called. Like > this: > > {code:java} > public RecordReader<NullWritable, ArrayWritable> getRecordReader(final > InputSplit split, final JobConf job, > final Reporter reporter) throws IOException { > if > (!job.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR).contains("col_a")) > { > job.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "col_a"); > } > if > (!job.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR).contains("1")) { > job.set(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR, "1"); > } > super.getRecordReader(split, job, reporter); > } > {code} > > In this situation, it will cause a problem when using COUNT(*) or COUNT(1) > query. Note that for COUNT(*) or COUNT(1), Hive don't need to read any > column. So the projection column ids is an empty string. > Here is a log example to show the whole workflow. > {code:java} > [DEBUG] [TezChild] |split.TezGroupedSplitsInputFormat|: Init record reader > for index 0 of 2 > [INFO] [TezChild] |realtime.HoodieParquetRealtimeInputFormat|: Before adding > Hoodie columns, Projections : Ids : > [INFO] [TezChild] |hadoop.HoodieParquetInputFormat|: After adding Hoodie > columns, Projections :col_a Ids :1 > [DEBUG] [TezChild] |split.TezGroupedSplitsInputFormat|: Init record reader > for index 1 of 2 > [INFO] [TezChild] |realtime.HoodieParquetRealtimeInputFormat|: Before adding > Hoodie columns, Projections :col_a Ids :,1 > {code} > As we can see, at the second time, projection ids becomes ",1" and that > additional comma will cause exceptions in the following program. > -- This message was sent by Atlassian Jira (v8.3.4#803005)