[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771652#comment-13771652
 ] 

Ashutosh Chauhan commented on HIVE-4113:
----------------------------------------

Thanks, [~yhuai] for taking this one up.
Its a known existing problem that predicate pushdown doesn't happen for 
HCatalog today. I will say that if it is getting burdensome, we can tackle that 
in a separate jira. 
I am fine with removing flag for column pruning. Its been around for a long 
time ( HIVE-279 ) and I haven't come across a case where user has run into 
problem with it.
I didn't get your comment about READ_ALL_COLUMNS_DEFAULT. If we set it to true, 
will that imply that this optimization will be off by default, that seems like 
a bad choice. In HCatInputFormat, we can probably set the config such that it 
always select all columns for now. That way Hive will still get the benefit of 
optimization and hcatalog will continue with what it is doing today. 
                
> Optimize select count(1) with RCFile and Orc
> --------------------------------------------
>
>                 Key: HIVE-4113
>                 URL: https://issues.apache.org/jira/browse/HIVE-4113
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>            Reporter: Gopal V
>            Assignee: Yin Huai
>             Fix For: 0.12.0
>
>         Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
> HIVE-4113.patch, HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>       } else {
>         // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
>         // skip all columns, this should be distinguished from the case:
>         // select * from tt;
>         for (int i = 0; i < skippedColIDs.length; i++) {
>           skippedColIDs[i] = false;
>         }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to