[ 
https://issues.apache.org/jira/browse/HIVE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876828#comment-14876828
 ] 

Gopal V commented on HIVE-11882:
--------------------------------

[~yalovyyi]: Yes, this would totally suck for S3, because the list operations 
are expensive. 

But I thought this was already implemented in InputEstimator, maybe we're not 
hitting the impl for the remaining check.

{code}
   * @param remaining Early exit condition. If it has positive value, further 
estimation
   *                  can be canceled on the point of exceeding it. In this 
case,
   *                  return any bigger length value then this (Long.MAX_VALUE, 
for eaxmple).
   */
{code}

> Fetch optimizer should stop source files traversal once it exceeds the 
> hive.fetch.task.conversion.threshold
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11882
>                 URL: https://issues.apache.org/jira/browse/HIVE-11882
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>    Affects Versions: 1.0.0
>            Reporter: Illya Yalovyy
>
> Hive 1.0's fetch optimizer tries to optimize queries of the form "select <C> 
> from <T> where <F> limit <L>" to a fetch task (see the 
> hive.fetch.task.conversion property). This optimization gets the lengths of 
> all the files in the specified partition and does some comparison against a 
> threshold value to determine whether it should use a fetch task or not (see 
> the hive.fetch.task.conversion.threshold property). This process of getting 
> the length of all files. One of the main problems in this optimization is the 
> fetch optimizer doesn't seem to stop once it exceeds the 
> hive.fetch.task.conversion.threshold. It works fine on HDFS, but could cause 
> a significant performance degradation on other supported file systems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to