[jira] [Commented] (HIVE-22964) MM table split computation is very slow

Peter Vary (Jira) Thu, 05 Mar 2020 03:33:14 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052036#comment-17052036
 ]


Peter Vary commented on HIVE-22964:
-----------------------------------

Hi [~aditya-shah],
 * I am not a big fan of renaming configuration variables. They can wreak havoc 
when upgrading a cluster
 * Sorry, I have missed the error handling part. My bad :(, but this highlights 
why it is good practice to use try catch around only the relevant part of the 
code where the exception can be thrown:
{code:java}
              for (Future<MMPathInfo> pathFuture : pathFutures) {
                finalPaths.addAll(pathFuture.get().getFinalPaths());
                
pathsWithFileOriginals.addAll(pathFuture.get().getPathsWithFileOriginals());
              }
{code}

 * Why are we using ugi.doAs? I have checked the other file related pool 
implementations, and did not find any place where it was used.
 * Usually it is a nightmare to synchronize guava between projects, so I prefer 
to use it only when it is really useful. Lists.newArrayList() is deprecated 
based on the docs 
([https://guava.dev/releases/19.0/api/docs/com/google/common/collect/Lists.html#newArrayList(])).
 Is there a specific purpose to use it here instead of the standard java new 
ArrayList()?
 * Maybe, if we were using lambdas for submitting the tasks we can get rid of 
the ProcessForWriteIdsForMmReadCallable / MMPathInfo objects. What do you think?
 * Also when we have output from the yetus run, please check the results of the 
checkstyle/findbugs for any newly introduced warnings.

Thanks,
 Peter

> MM table split computation is very slow
> ---------------------------------------
>
>                 Key: HIVE-22964
>                 URL: https://issues.apache.org/jira/browse/HIVE-22964
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Aditya Shah
>            Assignee: Aditya Shah
>            Priority: Major
>         Attachments: HIVE-22964.patch
>
>
> Since for MM table we process the paths prior to inputFormat.getSplits() we 
> end up doing listing on the whole table at once. This could be optimized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22964) MM table split computation is very slow

Reply via email to