Ádám Szita created HIVE-22705:
---------------------------------

             Summary: LLAP cache is polluted by query-based compactor
                 Key: HIVE-22705
                 URL: https://issues.apache.org/jira/browse/HIVE-22705
             Project: Hive
          Issue Type: Improvement
            Reporter: Ádám Szita
            Assignee: Ádám Szita


One of the steps that query-based compaction does is the verification of ACID 
sort order by using the _validate_acid_sort_order_ UDF. This is a prerequisite 
before the actual compaction can happen, and is done by a [query that reads the 
whole table 
content|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MajorQueryCompactor.java#L161-L167].

This results in the whole table content being populated into the cache. The 
problem is that this content is not useful and will rather pollute the cache 
space, as it can never be used again: cache content binds to files (file IDs) 
that obviously will be changed in this case by compaction.

I propose we disable LLAP caching in the session of query-based compaction's 
queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to