Steve Loughran created HADOOP-16456:
---------------------------------------

             Summary: Refactor the S3A codebase into a more maintainable and 
testable form
                 Key: HADOOP-16456
                 URL: https://issues.apache.org/jira/browse/HADOOP-16456
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs/s3
    Affects Versions: 3.3.0
            Reporter: Steve Loughran


The S3A Codebase has got too complex to be maintained. In particular,

* the lack of layering in the S3AFileSystem class means that all subcomponents 
(delegation, dynamo db, block outputstream etc) all get given a back reference 
and make arbitrary calls in to it.
* We can't test in isolation, and while integration tests are the most rigorous 
testing we can have, they are slow, hard to inject failures into and do not 
work on isolated parts of code
* The code within the S3A FileSystem calls the toplevel API calls internally, 
so mixing public interface with the implementation details
* We are adding context through S3Guard calls for: consistency, performance and 
recovery; we can't do that without a clean split between that public API and 
the internals

Proposed: 

# we carefully break up the S3AFileSystem into a layered design
# with a "StoreContext" to bind components of the connector to it
# and some form of operation context to be passed in with each request to 
represent the active operation and its state (including that for S3Guard 
BulkOperations)


See [refactoring 
S3A|https://github.com/steveloughran/engineering-proposals/blob/master/refactoring-s3a.md]

I've already started using some of this design in the HADOOP-15183 component, 
for the addition of those S3Guard bulk operations, and to add a medium-life 
"RenameOperation". The proposal document reviews that experience and discusses 
improvements.

As noted: this needs to be done with care. We still need to maintain the 
existing codebase; the more radically we change the code not only do we 
increase the risk of the changes being wrong, we make backporting that much 
harder. But we can't sustain the current design





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to