[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932870#comment-13932870 ]
Siddharth Seth commented on HIVE-6613: -------------------------------------- Thanks for taking a look. bq. Can you avoid creating a conf in TezCacheAccess? Maybe just pass it in get(). Was doing this once in the static block to avoid having to use a Configuration instance to access this class. TezCacheAccess is only supposed to be used with Tez. I could skip the factory all together and instantiate the Tez cache directly ? (The Configuration creation in this case should be very cheap since it isn't accessing external files) bq. Have you considered adding the input to the cache key instead of using a Set? The set just groups the fact that they're cached together. I can use individual keys if you think that's better. That will get rid of the lock - since the primary purpose is to control the set creation. bq. You can drop the getLocalWork check in the tez hashtable loader. Tez doesn't have local work. bq. The javadoc of the init function needs to be updated with your changes. Will fix > Control when spcific Inputs / Outputs are started > ------------------------------------------------- > > Key: HIVE-6613 > URL: https://issues.apache.org/jira/browse/HIVE-6613 > Project: Hive > Issue Type: Improvement > Reporter: Siddharth Seth > Assignee: Siddharth Seth > Attachments: TEZ-6613.1.txt > > > When running with Tez - a couple of enhancement are possible > 1) Avoid re-fetching data in case of MapJoins - since the data is likely to > be cached after the first run (container re-use for the same query) > 2) Start Outputs only after required Inputs are ready - specifically useful > in case of Reduce - where shuffle requires a large memory, and the Output (if > it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)