Thanks for the reply Steve, aligns what Aaron said above. Sooner the better for this branch merge :)
On Thu, Aug 17, 2017 at 6:49 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 16 Aug 2017, at 18:39, Andrew Wang <andrew.w...@cloudera.com> wrote: > > Hi Steve, > > What's the target release vehicle, and the timeline for merging this? The > target date for beta1 is mid-September, so any large code movements make me > nervous. > > > Code targets trunk, current state is ready to go in. > > I've also got it building & running against branch-2: all the code is > Java-7 and the classpath problems were dealt with by Mingliang. > > > Could you comment on testing and API stability of this branch? I'm > trusting the judgement of the contributors involved, since there isn't much > time to fix things before beta1. > > > > This is all working in the s3 code, and it's something you have to > explicitly enable; I'm confident that when disabled it doesn't cause > problems > > There's two modes of use in production (as well as a local dynamodb for > testing) > > * dynamo DB as cache, "non authoritative" > * dynamo DB as store of record, "authoritative" > > I'm fairly happy with non-auth; but as auth assumes that all clients are > using s3guard, it's the one with the most risks. That one I'd be cautious > over. But it does deliver the best speedup. And it lets you use the v1/v2 > algorithms to commit output, as now you get the consistent directory > listings you need. There's still the O(data) COPY call, but at least the > risk of incomplete listings -> incomplete copy operation is eliminated. > > We've had a preview version up for a while, running large hive/LLAP tests > against it happily in particular, and my spark & cloud testing has shown > all is well (indeed, I can show how all isn't well if you enable the > inconsistent FS client and *dont* turn s3guard on). > > After the initial merge, there is more work to do, but mostly around: > metrics, diagnostics, and the new committer work which depends on the > consistent listings for one of the committers, but doesn't do *any* API > calls into s3guard itself. All it needs is a consistent S3 endpoint, be it > AWS S3 & S3Guard, or something else like the WDC cloud store. That's not > going to be ready for Beta 1. > > -Steve > > > > > Best, > Andrew > > On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran <ste...@hortonworks.com> > wrote: > >> >> FYI, We're getting ready for a patch to merge the current S3Guard branch, >> HADOOP-13345, via a patch https://issues.apache.org/jira >> /browse/HADOOP-13998 >> >> After that's done, we do plan to have a second iteration, work on a >> 0-rename committer (HADOOP-13786) with all the other tuning and >> improvements; We'd add a new uber-JIRA & move stuff over, maybe branch, >> and/or do things patch-by-patch . >> >> Anyway, now is a great time for people to download and play >> >> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop- >> tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md >> >> testing this >> >> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop- >> tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md >> >> The Inconsistent AWS Client is also something everyone is free to use for >> injecting inconsistencies (and soon faults) into their own apps by way of >> 2-3 config options. Want to know how your code handles S3A being observably >> inconsistent? We'll let you do that. >> >> -Steve >> >> >> > >