Thanks for the detailed explanation Aaron. Given that this has gone through Cloudera's QA cycle and is run in production, that adds a lot of confidence in the feature. Looking forward to having this in 3.0.0-beta1!
Best, Andrew On Wed, Aug 16, 2017 at 2:17 PM, Aaron Fabbri <fab...@cloudera.com> wrote: > > > On Wed, Aug 16, 2017 at 1:39 PM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > >> Hi Steve, >> >> What's the target release vehicle, and the timeline for merging this? The >> target date for beta1 is mid-September, so any large code movements make >> me >> nervous. >> > > I think this is ready to get in before beta1. Most of upstream s3a dev > has been happening on this branch so it has a lot of improvements and > testing. > > >> Could you comment on testing and API stability of this branch? I'm >> trusting >> the judgement of the contributors involved, since there isn't much time to >> fix things before beta1. >> >> > We've done a ton of testing on this branch: > > - List consistency tests with failure injection. (HADOOP-13793) This > integration test forces a delay in visibility of certain files by wrapping > the AWS S3 client. It asserts listing is consistent. The test fails without > S3Guard, and succeeds with it. > > - All existing S3 integration tests with and without S3Guard. The > filesystem contract tests have been invaluable here. (HADOOP-13589 makes > these very easy to run). > > - MetadataStore contract tests that ensure that the API semantics of the > DynamoDB and in-memory reference implementations are correct. > > - MetadataStore scale tests that can be used to force DynamoDB service > throttling and ensure we are robust to that. > > - Unit tests for different parts of the S3Guard logic. > > As you probably know, at Cloudera we are using this codebase in > production, and have run all of our downstream tests including Hive, Spark, > Impala on the new S3A client code, with and without S3Guard enabled. > > In terms of API compatibility, the new features sit behind the FileSystem > / FileContext APIs, which have not changed. Applications don't require any > changes. Internal APIs for S3Guard, such as MetadataStore (currently > private / evolving), should be properly annotated already. The S3Guard > work has been active for quite a while now, so the APIs are fairly stable > in practice. > > Probably my biggest goal in writing the S3AFileSystem integration code > (HADOOP-13651) was to preserve existing logic and correctness when S3Guard > is not enabled. One design choice which has worked well was to define a > "null" implementation of the MetadataStore (the API that filesystem clients > use to log metadata changes): > > https://github.com/apache/hadoop/blob/HADOOP-13345/ > hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/ > NullMetadataStore.java > > This is used in S3A by default. This made it easier to reason about > correctness and minimized the size of the diff to the FS client as well. > > Other questions welcomed! > > Cheers, > Aaron > > > > Best, >> Andrew >> >> On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran <ste...@hortonworks.com> >> wrote: >> >> > >> > FYI, We're getting ready for a patch to merge the current S3Guard >> branch, >> > HADOOP-13345, via a patch https://issues.apache.org/ >> > jira/browse/HADOOP-13998 >> > >> > After that's done, we do plan to have a second iteration, work on a >> > 0-rename committer (HADOOP-13786) with all the other tuning and >> > improvements; We'd add a new uber-JIRA & move stuff over, maybe branch, >> > and/or do things patch-by-patch . >> > >> > Anyway, now is a great time for people to download and play >> > >> > https://github.com/apache/hadoop/blob/HADOOP-13345/ >> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md >> > >> > testing this >> > >> > https://github.com/apache/hadoop/blob/HADOOP-13345/ >> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md >> > >> > The Inconsistent AWS Client is also something everyone is free to use >> for >> > injecting inconsistencies (and soon faults) into their own apps by way >> of >> > 2-3 config options. Want to know how your code handles S3A being >> observably >> > inconsistent? We'll let you do that. >> > >> > -Steve >> > >> > >> > >> > >