[jira] [Commented] (HIVE-6613) Control when spcific Inputs / Outputs are started

Siddharth Seth (JIRA) Wed, 12 Mar 2014 21:54:07 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932870#comment-13932870
 ]


Siddharth Seth commented on HIVE-6613:
--------------------------------------

Thanks for taking a look.

bq. Can you avoid creating a conf in TezCacheAccess? Maybe just pass it in 
get().
Was doing this once in the static block to avoid having to use a Configuration 
instance to access this class. TezCacheAccess is only supposed to be used with 
Tez. I could skip the factory all together and instantiate the Tez cache 
directly ? (The Configuration creation in this case should be very cheap since 
it isn't accessing external files)

bq. Have you considered adding the input to the cache key instead of using a 
Set? 
The set just groups the fact that they're cached together. I can use individual 
keys if you think that's better. That will get rid of the lock - since the 
primary purpose is to control the set creation.

bq. You can drop the getLocalWork check in the tez hashtable loader. Tez 
doesn't have local work.
bq. The javadoc of the init function needs to be updated with your changes.
Will fix

> Control when spcific Inputs / Outputs are started
> -------------------------------------------------
>
>                 Key: HIVE-6613
>                 URL: https://issues.apache.org/jira/browse/HIVE-6613
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-6613.1.txt
>
>
> When running with Tez - a couple of enhancement are possible
> 1) Avoid re-fetching data in case of MapJoins - since the data is likely to 
> be cached after the first run (container re-use for the same query)
> 2) Start Outputs only after required Inputs are ready - specifically useful 
> in case of Reduce - where shuffle requires a large memory, and the Output (if 
> it's a sorted output) also requires a fair amount of memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6613) Control when spcific Inputs / Outputs are started

Reply via email to