[ https://issues.apache.org/jira/browse/HADOOP-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mukund Thakur resolved HADOOP-17400. ------------------------------------ Resolution: Fixed > Optimize S3A for maximum performance in directory listings > ---------------------------------------------------------- > > Key: HADOOP-17400 > URL: https://issues.apache.org/jira/browse/HADOOP-17400 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 > Affects Versions: 3.3.0 > Reporter: Steve Loughran > Assignee: Mukund Thakur > Priority: Major > > Make listing in applications as fast as we can get it especially for query > planning. > * All operations used in listing directories for query planning etc to be > optimized for their primary use: being passed directories (not files) and so > make that faster even at the expense of more remote IO when handed files or > empty directories. > * remove needless calls to S3 wherever possible (e.g. {{getFileStatus("/")}}, > making bucket existence probes optional) > * Support/enable Asynchronous IO where possible. > > Review higher level APIs (glob status) and uses on the FsShell and optimize > their use by minimising invocations or FS API calls, with bonus goal of > reduce/minimize risk of 404 caching. > Work with downstream projects to move to FS APIs which work best in this > world -primarily the recursive listing operations and those which return > RemoteIterator<FileStatus> -and so make any asynchronous page fetching > operations useful. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org