[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008549#comment-13008549 ]
Joydeep Sen Sarma commented on HIVE-2051: ----------------------------------------- Siying - i think we shouldn't ignore ExecutionException. The best part of checking for each task status seems to be that we can find out if any of them failed (indicated by ExecutionException). Also we can remove the executor.awaitTermination() call as well (same feedback as the comments above). also - do you want to make the core of this routine synchronized (perhaps on the context object - which is one per query)? there really is no point running more than one of these per query at a time. (we can move this whole routine to the Context object if that seems like a better place (or at least make the call from the Context object where it can be marked as a synchronized method). otherwise looks good. please upload a new patch and i will test and commit. > getInputSummary() to call FileSystem.getContentSummary() in parallel > -------------------------------------------------------------------- > > Key: HIVE-2051 > URL: https://issues.apache.org/jira/browse/HIVE-2051 > Project: Hive > Issue Type: Improvement > Reporter: Siying Dong > Assignee: Siying Dong > Priority: Minor > Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, > HIVE-2051.4.patch > > > getInputSummary() now call FileSystem.getContentSummary() one by one, which > can be extremely slow when the number of input paths are huge. By calling > those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira