Steve Loughran created HADOOP-13829:
---------------------------------------

             Summary: S3A getContentSummary to use flat listFiles instead of 
treewalk
                 Key: HADOOP-13829
                 URL: https://issues.apache.org/jira/browse/HADOOP-13829
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 2.8.0
            Reporter: Steve Loughran
            Priority: Minor


FS shell {{-count}} uses {{getContentSummary}}  to summarise the contents; this 
slows significantly with directory tree depth. On wide directories, as the 
FileStatus[] array is built up before recursing down, if there are many 
millions of files, memory use becomes an issue


Moving to a flat listFiles listing with iterator-based scanning would allow 
directory depth to become a near-non-issue, avoid memory problems. We'd need to 
reverse-construct the directory tree for its count summary; some hash map of 
parent paths could build that up while iterating through the files and adding 
up their sizes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to