[jira] [Assigned] (HIVE-22438) Additional comma is added to projection column ids

Wenning Ding (Jira) Thu, 31 Oct 2019 00:39:42 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-22438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wenning Ding reassigned HIVE-22438:
-----------------------------------


> Additional comma is added to projection column ids
> --------------------------------------------------
>
>                 Key: HIVE-22438
>                 URL: https://issues.apache.org/jira/browse/HIVE-22438
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Wenning Ding
>            Assignee: Wenning Ding
>            Priority: Major
>
> I ran into this issue when querying a Hudi data through Hive.
> Basically, to query a Hudi style table, Hudi implements its own InputFormat 
> class and overwrite the getRecordReader method. In this method, because of 
> some reasons, Hudi will manually add several projection column ids and 
> projection column names when each time getRecordReader method is called. Like 
> this:
>  
> {code:java}
> public RecordReader<NullWritable, ArrayWritable> getRecordReader(final 
> InputSplit split, final JobConf job,
>         final Reporter reporter) throws IOException {
>     if 
> (!job.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR).contains("col_a"))
>  {
>         job.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "col_a");
>     }
>     if 
> (!job.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR).contains("1")) {
>         job.set(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR, "1");
>     }
>     super.getRecordReader(split, job, reporter);
> }
> {code}
>  
> In this situation, it will cause a problem when using COUNT(*) or COUNT(1) 
> query. Note that for COUNT(*) or COUNT(1), Hive don't need to read any 
> column. So the projection column ids is an empty string.
> Here is a log example to show the whole workflow.
> {code:java}
> [DEBUG] [TezChild] |split.TezGroupedSplitsInputFormat|: Init record reader 
> for index 0 of 2
> [INFO] [TezChild] |realtime.HoodieParquetRealtimeInputFormat|: Before adding 
> Hoodie columns, Projections : Ids :
> [INFO] [TezChild] |hadoop.HoodieParquetInputFormat|: After adding Hoodie 
> columns, Projections :col_a Ids :1
> [DEBUG] [TezChild] |split.TezGroupedSplitsInputFormat|: Init record reader 
> for index 1 of 2
> [INFO] [TezChild] |realtime.HoodieParquetRealtimeInputFormat|: Before adding 
> Hoodie columns, Projections :col_a Ids :,1
> {code}
> As we can see, at the second time, projection ids becomes ",1" and that 
> additional comma will cause exceptions in the following program.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-22438) Additional comma is added to projection column ids

Reply via email to