[ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008549#comment-13008549
 ] 

Joydeep Sen Sarma commented on HIVE-2051:
-----------------------------------------

Siying - i think we shouldn't ignore ExecutionException. The best part of 
checking for each task status seems to be that we can find out if any of them 
failed (indicated by ExecutionException). Also we can remove the 
executor.awaitTermination() call as well (same feedback as the comments above).

also - do you want to make the core of this routine synchronized (perhaps on 
the context object - which is one per query)? there really is no point running 
more than one of these per query at a time. (we can move this whole routine to 
the Context object if that seems like a better place (or at least make the call 
from the Context object where it can be marked as a synchronized method).

otherwise looks good. please upload a new patch and i will test and commit.

> getInputSummary() to call FileSystem.getContentSummary() in parallel
> --------------------------------------------------------------------
>
>                 Key: HIVE-2051
>                 URL: https://issues.apache.org/jira/browse/HIVE-2051
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
> HIVE-2051.4.patch
>
>
> getInputSummary() now call FileSystem.getContentSummary() one by one, which 
> can be extremely slow when the number of input paths are huge. By calling 
> those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to