[ https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593700#comment-15593700 ]
Rajesh Balamohan edited comment on HIVE-14953 at 10/21/16 2:16 AM: ------------------------------------------------------------------- [~sershe] - It should be listFiles(path, recursive). I accidentally added as listStatus recursive in my earlier comment. Default FS: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L1814 S3A FS which optimizes for bulk listing: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2025 So instead of 1000s of calls to s3 with globStatus, it would end up using very few calls to S3 with listFiles(path, recursive) and client side path filtering can be done on need basis. was (Author: rajesh.balamohan): [~sershe] - It should be listFiles(path, recursive). I accidentally added as listStatus recursive in my earlier comment. Default FS: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L1814 S3A FS which optimizes for bulk listing: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2025 > don't use globStatus on S3 in MM tables > --------------------------------------- > > Key: HIVE-14953 > URL: https://issues.apache.org/jira/browse/HIVE-14953 > Project: Hive > Issue Type: Sub-task > Reporter: Rajesh Balamohan > Assignee: Sergey Shelukhin > Fix For: hive-14535 > > Attachments: HIVE-14953.patch > > > Need to investigate if recursive get is faster. Also, normal listStatus might > suffice because MM code handles directory structure in a more definite manner > than old code; so it knows where the files of interest are to be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)