[ 
https://issues.apache.org/jira/browse/HIVE-24710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-24710:
------------------------------------
    Summary: Optimise PTF iteration for count(*) to reduce CPU and IO cost  
(was: PTFRowContainer could be reading more number of blocks than needed)

> Optimise PTF iteration for count(*) to reduce CPU and IO cost
> -------------------------------------------------------------
>
>                 Key: HIVE-24710
>                 URL: https://issues.apache.org/jira/browse/HIVE-24710
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Rajesh Balamohan
>            Priority: Major
>              Labels: performance
>
> PTFRowContainer could be reading the same block repeatedly for the first 
> block. Default block size is around 25000. For the first 25000 rowIdx, it 
> would read the block repeatedly due to ("rowIdx < currentReadBlockStartRow ") 
> condition.
> {noformat}
>  public Row getAt(int rowIdx) throws HiveException {
>     int blockSize = getBlockSize();
>     if ( rowIdx < currentReadBlockStartRow || rowIdx >= 
> currentReadBlockStartRow + blockSize ) {
>       readBlock(getBlockNum(rowIdx));
>     }
>     return getReadBlockRow(rowIdx - currentReadBlockStartRow);
>   }
> {noformat} 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java#L167
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to