Steve Loughran created HADOOP-18650:
---------------------------------------

             Summary: improve s3a committer stats collected
                 Key: HADOOP-18650
                 URL: https://issues.apache.org/jira/browse/HADOOP-18650
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 3.3.5
            Reporter: Steve Loughran


we can improve stats collected in the s3a committer and saved to the JSON.

key ones
# of task manifests read; duration of loads
# size of each manifest

I think we would also benefit if we could set the commit thread pools to be big 
-but then shared across all jobs (i.e. demand-created thread pool in s3a fs). 
that would allow for a pool size of say, 500, but still support many jobs 
actively committing at same time (busy spark driver)
finally: should file commit pool size be > size of pool of manifest readers. I 
think it could be, but the ratio should be fairly low.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to