[ https://issues.apache.org/jira/browse/HIVE-22964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051924#comment-17051924 ]
Peter Vary commented on HIVE-22964: ----------------------------------- Hi [~aditya-shah], There are multiple places where similar parallelization happens. See for example HIVE-22832. What do you think about reusing the HIVE_MOVE_FILES_THREAD_COUNT configuration value for this as well? I know this is not ideal, but I see this config reused multiple times where we want to parallelize the file access/checks. Also if there is an error when accessing one of the files, the original solution stops immediately, while the new solution will try to access all of the files - this could be problematic for tables on S3 with great number of files. (HIVE-22832 solves this as well) Thanks, Peter > MM table split computation is very slow > --------------------------------------- > > Key: HIVE-22964 > URL: https://issues.apache.org/jira/browse/HIVE-22964 > Project: Hive > Issue Type: Improvement > Reporter: Aditya Shah > Assignee: Aditya Shah > Priority: Major > Attachments: HIVE-22964.patch > > > Since for MM table we process the paths prior to inputFormat.getSplits() we > end up doing listing on the whole table at once. This could be optimized. -- This message was sent by Atlassian Jira (v8.3.4#803005)