Steve Loughran created HADOOP-13829: ---------------------------------------
Summary: S3A getContentSummary to use flat listFiles instead of treewalk Key: HADOOP-13829 URL: https://issues.apache.org/jira/browse/HADOOP-13829 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 2.8.0 Reporter: Steve Loughran Priority: Minor FS shell {{-count}} uses {{getContentSummary}} to summarise the contents; this slows significantly with directory tree depth. On wide directories, as the FileStatus[] array is built up before recursing down, if there are many millions of files, memory use becomes an issue Moving to a flat listFiles listing with iterator-based scanning would allow directory depth to become a near-non-issue, avoid memory problems. We'd need to reverse-construct the directory tree for its count summary; some hash map of parent paths could build that up while iterating through the files and adding up their sizes -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org