[ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=462592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462592
 ]

ASF GitHub Bot logged work on HIVE-23843:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Jul/20 15:34
            Start Date: 23/Jul/20 15:34
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459540489



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       I don't think we should get to L1 cache stuff here; I'm ok with not 
having an LRU, we might not need to do that - let's stay on logical level:
   
   I'm saying that if in a round the hit count of an element is not reduced or 
reset to zero (when it's being kept for next round) those keys could retain 
their places for a long time because of very old cache hits - and they will 
keep their place in the cache; this could lead to a case in which all the new 
elements (n/2) added to the cache are evicted and why very old entries which 
are not even getting any hits will keep half of the cache filled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 462592)
    Time Spent: 2h 10m  (was: 2h)

> Improve key evictions in VectorGroupByOperator
> ----------------------------------------------
>
>                 Key: HIVE-23843
>                 URL: https://issues.apache.org/jira/browse/HIVE-23843
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to