[ 
https://issues.apache.org/jira/browse/HIVE-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809117#comment-13809117
 ] 

Remus Rusanu commented on HIVE-5692:
------------------------------------

The implementation is much more aggresive now:

 - shouldFlush test the in-use vs. max at each batch boudary, not only at 
checking limit. Checking limit is only used to decide when to probe/adjust the 
average variable row size
 - the flush is called in a while loop until it shouldFlush returns false, ie. 
it flushes as much as necessary to stay within the prescribed bounds. Progress 
is being monitored to prevent infinite loop.
 - the checking limit is configured via HiveConf  
hive.vectorized.groupby.checkinterval
 - the flushing percent is configured via HiveConf 
hive.vectorized.groupby.flush.percent



> Make VectorGroupByOperator parameters configurable
> --------------------------------------------------
>
>                 Key: HIVE-5692
>                 URL: https://issues.apache.org/jira/browse/HIVE-5692
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>            Priority: Minor
>         Attachments: HIVE-5692.1.patch, HIVE-5692.2.patch
>
>
> The FLUSH_CHECK_THRESHOLD and PERCENT_ENTRIES_TO_FLUSH should be configurable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to