[ 
https://issues.apache.org/jira/browse/HIVE-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775544#comment-16775544
 ] 

Gopal V commented on HIVE-21305:
--------------------------------

bq. We decide if the query inserts into a table then we do not add entries to 
the cache, but we still use the existing cache elements?

The cache does the read through, so the cache is in charge of reading data into 
itself - the items are not read and then placed into the cache.

bq. We might be better off caching the small tables but skipping the big ones.

Once you decide not to cache in a scenario, the smallest tables are the least 
worth caching - the improvement in performance is going to be smaller as the 
tables get smaller.

A more granular decision might be helpful, but this is a "too obvious" general 
ticket & not the final version (we will learn as we implement and deploy).

The original customer case was for the SerDeEncodedDataReader (which burns CPU 
to do intermediate transforms, not just caching data), not the ORC cache.

And the real issue was displacement as well (the existing "hot data"  getting 
displaced by this) - in the most common scenario, the text data is never going 
to be read again.

> LLAP: Option to skip cache for ETL queries
> ------------------------------------------
>
>                 Key: HIVE-21305
>                 URL: https://issues.apache.org/jira/browse/HIVE-21305
>             Project: Hive
>          Issue Type: Improvement
>          Components: llap
>    Affects Versions: 4.0.0
>            Reporter: Prasanth Jayachandran
>            Priority: Major
>
> To avoid ETL queries from polluting the cache, would be good to detect such 
> queries at compile time and optional skip llap io for such queries. 
> org.apache.hadoop.hive.ql.parse.QBParseInfo.hasInsertTables() is the simplest 
> way  to catch ETL queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to