I think this is a great idea! Having just gone through the process of getting Phoenix up to speed with precommits, it would be really nice to have a place to go other than "fork/hack someone else's work". For the same project, I recently integrated its first daemon service. This meant adding a bunch of servicy Python code (multi platform support is required) which I only sort of trust. Again, would be great to have an explicit resource for this kind of thing in the ecosystem. I expect Calcite and Kylin will be following along shortly.
Since we're tossing out names, how about Apache Bootstrap? It's a meta-project to help other projects get off the ground, after all. -n On Monday, June 15, 2015, Sean Busbey <bus...@cloudera.com> wrote: > Oof. I had meant to push on this again but life got in the way and now the > June board meeting is upon us. Sorry everyone. In the event that this ends > up contentious, hopefully one of the copied communities can give us a > branch to work in. > > I know everyone is busy, so here's the short version of this email: I'd > like to move some of the code currently in Hadoop (test-patch) into a new > TLP focused on QA tooling. I'm not sure what the best format for priming > this conversation is. ORC filled in the incubator project proposal > template, but I'm not sure how much that confused the issue. So to start, > I'll just write what I'm hoping we can accomplish in general terms here. > > All software development projects that are community based (that is, > accepting outside contributions) face a common QA problem for vetting > in-coming contributions. Hadoop is fortunate enough to be sufficiently > popular that the weight of the problem drove tool development (i.e. > test-patch). That tool is generalizable enough that a bunch of other TLPs > have adopted their own forks. Unfortunately, in most projects this kind of > QA work is an enabler rather than a primary concern, so often the tooling > is worked on ad-hoc and little shared improvements happen across > projects. Since > the tooling itself is never a primary concern, any made is rarely reused > outside of ASF projects. > > Over the last couple months a few of us have been working on generalizing > the tooling present in the Hadoop code base (because it was the most mature > out of all those in the various projects) and it's reached a point where we > think we can start bringing on other downstream users. This means we need > to start establishing things like a release cadence and to grow the new > contributors we have to handle more project responsibility. Personally, I > think that means it's time to move out from under Hadoop to drive things as > our own community. Eventually, I hope the community can help draw in a > group of folks traditionally underrepresented in ASF projects, namely QA > and operations folks. > > I think test-patch by itself has enough scope to justify a project. Having > a solid set of build tools that are customizable to fit the norms of > different software communities is a bunch of work. Making it work well in > both the context of automated test systems like Jenkins and for individual > developers is even more work. We could easily also take over maintenance of > things like shelldocs, since test-patch is the primary consumer of that > currently but it's generally useful tooling. > > In addition to test-patch, I think the proposed project has some future > growth potential. Given some adoption of test-patch to prove utility, the > project could build on the ties it makes to start building tools to help > projects do their own longer-run testing. Note that I'm talking about the > tools to build QA processes and not a particular set of tested components. > Specifically, I think the ChaosMonkey work that's in HBase should be > generalizable as a fault injection framework (either based on that code or > something like it). Doing this for arbitrary software is obviously very > difficult, and a part of easing that will be to make (and then favor) > tooling to allow projects to have operational glue that looks the same. > Namely, the shell work that's been done in hadoop-functions.sh would be a > great foundational layer that could bring good daemon handling practices to > a whole slew of software projects. In the event that these frameworks and > tools get adopted by parts of the Hadoop ecosystem, that could make the job > of i.e. Bigtop substantially easier. > > I've reached out to a few folks who have been involved in the current > test-patch work or expressed interest in helping out on getting it used in > other projects. Right now, the proposed PMC would be (alphabetical by last > name): > > * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds > pmc, sqoop pmc, all around Jenkins expert) > * Sean Busbey (ASF member, accumulo pmc, hbase pmc) > * Nick Dimiduk (hbase pmc, phoenix pmc) > * Chris Nauroth (ASF member, incubator pmc, hadoop pmc) > * Andrew Purtell (ASF member, incubator pmc, bigtop pmc, hbase pmc, > phoenix pmc) > * Allen Wittenauer (hadoop committer) > > That PMC gives us several members and a bunch of folks familiar with the > ASF. Combined with the code already existing in Apache spaces, I think that > gives us sufficient justification for a direct board proposal. > > The planned project name is "Apache Yetus". It's an archaic genus of sea > snail and most of our project will be focused on shell scripts. > > N.b.: this does not mean that the Hadoop community would _have_ to rely on > the new TLP, but I hope that once we have a release that can be evaluated > there'd be enough benefit to strongly encourage it. > > This has mostly been focused on scope and community issues, and I'd love to > talk through any feedback on that. Additionally, are there any other points > folks want to make sure are covered before we have a resolution? > > On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bus...@cloudera.com > <javascript:;>> wrote: > > > Sorry for the resend. I figured this deserves a [DISCUSS] flag. > > > > > > > > On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bus...@cloudera.com > <javascript:;>> wrote: > > > >> Hi Folks! > >> > >> After working on test-patch with other folks for the last few months, I > >> think we've reached the point where we can make the fastest progress > >> towards the goal of a general use pre-commit patch tester by spinning > >> things into a project focused on just that. I think we have a mature > enough > >> code base and a sufficient fledgling community, so I'm going to put > >> together a tlp proposal. > >> > >> Thanks for the feedback thus far from use within Hadoop. I hope we can > >> continue to make things more useful. > >> > >> -Sean > >> > >> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bus...@cloudera.com > <javascript:;>> wrote: > >> > >>> HBase's dev-support folder is where the scripts and support files live. > >>> We've only recently started adding anything to the maven builds that's > >>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where > I'd > >>> add in more if we ran into the same permissions problems y'all are > having. > >>> > >>> There's also our precommit job itself, though it isn't large[2]. AFAIK, > >>> we don't properly back this up anywhere, we just notify each other of > >>> changes on a particular mail thread[3]. > >>> > >>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687 > >>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all > >>> read because I just finished fixing "mvn site" running out of permgen) > >>> [3]: http://s.apache.org/NT0 > >>> > >>> > >>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth < > cnaur...@hortonworks.com <javascript:;> > >>> > wrote: > >>> > >>>> Sure, thanks Sean! Do we just look in the dev-support folder in the > >>>> HBase > >>>> repo? Is there any additional context we need to be aware of? > >>>> > >>>> Chris Nauroth > >>>> Hortonworks > >>>> http://hortonworks.com/ > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bus...@cloudera.com > <javascript:;>> wrote: > >>>> > >>>> >+dev@hbase > >>>> > > >>>> >HBase has recently been cleaning up our precommit jenkins jobs to > make > >>>> >them > >>>> >more robust. From what I can tell our stuff started off as an earlier > >>>> >version of what Hadoop uses for testing. > >>>> > > >>>> >Folks on either side open to an experiment of combining our precommit > >>>> >check > >>>> >tooling? In principle we should be looking for the same kinds of > >>>> things. > >>>> > > >>>> >Naturally we'll still need different jenkins jobs to handle different > >>>> >resource needs and we'd need to figure out where stuff eventually > >>>> lives, > >>>> >but that could come later. > >>>> > > >>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth < > >>>> cnaur...@hortonworks.com <javascript:;>> > >>>> >wrote: > >>>> > > >>>> >> The only thing I'm aware of is the failOnError option: > >>>> >> > >>>> >> > >>>> >> > >>>> > http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro > >>>> >>rs > >>>> >> .html > >>>> >> > >>>> >> > >>>> >> I prefer that we don't disable this, because ignoring different > >>>> kinds of > >>>> >> failures could leave our build directories in an indeterminate > state. > >>>> >>For > >>>> >> example, we could end up with an old class file on the classpath > for > >>>> >>test > >>>> >> runs that was supposedly deleted. > >>>> >> > >>>> >> I think it's worth exploring Eddy's suggestion to try simulating > >>>> failure > >>>> >> by placing a file where the code expects to see a directory. That > >>>> might > >>>> >> even let us enable some of these tests that are skipped on Windows, > >>>> >> because Windows allows access for the owner even after permissions > >>>> have > >>>> >> been stripped. > >>>> >> > >>>> >> Chris Nauroth > >>>> >> Hortonworks > >>>> >> http://hortonworks.com/ > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu > <javascript:;>> wrote: > >>>> >> > >>>> >> >Is there a maven plugin or setting we can use to simply remove > >>>> >> >directories that have no executable permissions on them? Clearly > we > >>>> >> >have the permission to do this from a technical point of view > (since > >>>> >> >we created the directories as the jenkins user), it's simply that > >>>> the > >>>> >> >code refuses to do it. > >>>> >> > > >>>> >> >Otherwise I guess we can just fix those tests... > >>>> >> > > >>>> >> >Colin > >>>> >> > > >>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com > <javascript:;>> wrote: > >>>> >> >> Thanks a lot for looking into HDFS-7722, Chris. > >>>> >> >> > >>>> >> >> In HDFS-7722: > >>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in > >>>> >> >>TearDown(). > >>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally > clause. > >>>> >> >> > >>>> >> >> Also I ran mvn test several times on my machine and all tests > >>>> passed. > >>>> >> >> > >>>> >> >> However, since in DiskChecker#checkDirAccess(): > >>>> >> >> > >>>> >> >> private static void checkDirAccess(File dir) throws > >>>> >>DiskErrorException { > >>>> >> >> if (!dir.isDirectory()) { > >>>> >> >> throw new DiskErrorException("Not a directory: " > >>>> >> >> + dir.toString()); > >>>> >> >> } > >>>> >> >> > >>>> >> >> checkAccessByFileMethods(dir); > >>>> >> >> } > >>>> >> >> > >>>> >> >> One potentially safer alternative is replacing data dir with a > >>>> >>regular > >>>> >> >> file to stimulate disk failures. > >>>> >> >> > >>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth > >>>> >> >><cnaur...@hortonworks.com <javascript:;>> wrote: > >>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, > >>>> >> >>> TestDataNodeVolumeFailureReporting, and > >>>> >> >>> TestDataNodeVolumeFailureToleration all remove executable > >>>> >>permissions > >>>> >> >>>from > >>>> >> >>> directories like the one Colin mentioned to simulate disk > >>>> failures > >>>> >>at > >>>> >> >>>data > >>>> >> >>> nodes. I reviewed the code for all of those, and they all > appear > >>>> >>to be > >>>> >> >>> doing the necessary work to restore executable permissions at > the > >>>> >>end > >>>> >> >>>of > >>>> >> >>> the test. The only recent uncommitted patch I¹ve seen that > makes > >>>> >> >>>changes > >>>> >> >>> in these test suites is HDFS-7722. That patch still looks fine > >>>> >> >>>though. I > >>>> >> >>> don¹t know if there are other uncommitted patches that changed > >>>> these > >>>> >> >>>test > >>>> >> >>> suites. > >>>> >> >>> > >>>> >> >>> I suppose it¹s also possible that the JUnit process > unexpectedly > >>>> >>died > >>>> >> >>> after removing executable permissions but before restoring > them. > >>>> >>That > >>>> >> >>> always would have been a weakness of these test suites, > >>>> regardless > >>>> >>of > >>>> >> >>>any > >>>> >> >>> recent changes. > >>>> >> >>> > >>>> >> >>> Chris Nauroth > >>>> >> >>> Hortonworks > >>>> >> >>> http://hortonworks.com/ > >>>> >> >>> > >>>> >> >>> > >>>> >> >>> > >>>> >> >>> > >>>> >> >>> > >>>> >> >>> > >>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com > <javascript:;>> wrote: > >>>> >> >>> > >>>> >> >>>>Hey Colin, > >>>> >> >>>> > >>>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going > >>>> on > >>>> >>with > >>>> >> >>>>these boxes. He took a look and concluded that some perms are > >>>> being > >>>> >> >>>>set in > >>>> >> >>>>those directories by our unit tests which are precluding those > >>>> files > >>>> >> >>>>from > >>>> >> >>>>getting deleted. He's going to clean up the boxes for us, but > we > >>>> >>should > >>>> >> >>>>expect this to keep happening until we can fix the test in > >>>> question > >>>> >>to > >>>> >> >>>>properly clean up after itself. > >>>> >> >>>> > >>>> >> >>>>To help narrow down which commit it was that started this, > Andrew > >>>> >>sent > >>>> >> >>>>me > >>>> >> >>>>this info: > >>>> >> >>>> > >>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- > >>>> >> > >>>> > >>>> > >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3 > >>>> >>>>>>/ > >>>> >> >>>>has > >>>> >> >>>>500 perms, so I'm guessing that's the problem. Been that way > >>>> since > >>>> >>9:32 > >>>> >> >>>>UTC > >>>> >> >>>>on March 5th." > >>>> >> >>>> > >>>> >> >>>>-- > >>>> >> >>>>Aaron T. Myers > >>>> >> >>>>Software Engineer, Cloudera > >>>> >> >>>> > >>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe > >>>> >><cmcc...@apache.org <javascript:;>> > >>>> >> >>>>wrote: > >>>> >> >>>> > >>>> >> >>>>> Hi all, > >>>> >> >>>>> > >>>> >> >>>>> A very quick (and not thorough) survey shows that I can't > find > >>>> any > >>>> >> >>>>> jenkins jobs that succeeded from the last 24 hours. Most of > >>>> them > >>>> >> >>>>>seem > >>>> >> >>>>> to be failing with some variant of this message: > >>>> >> >>>>> > >>>> >> >>>>> [ERROR] Failed to execute goal > >>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean > >>>> >>(default-clean) > >>>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to > >>>> delete > >>>> >> >>>>> > >>>> >> >>>>> > >>>> >> > >>>> > >>>> > >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd > >>>> >>>>>>>fs > >>>> >> >>>>>-pr > >>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 > >>>> >> >>>>> -> [Help 1] > >>>> >> >>>>> > >>>> >> >>>>> Any ideas how this happened? Bad disk, unit test setting > wrong > >>>> >> >>>>> permissions? > >>>> >> >>>>> > >>>> >> >>>>> Colin > >>>> >> >>>>> > >>>> >> >>> > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> -- > >>>> >> >> Lei (Eddy) Xu > >>>> >> >> Software Engineer, Cloudera > >>>> >> > >>>> >> > >>>> > > >>>> > > >>>> >-- > >>>> >Sean > >>>> > >>>> > >>> > >>> > >>> -- > >>> Sean > >>> > >> > >> > >> > >> -- > >> Sean > >> > > > > > > > > -- > > Sean > > > > > > -- > Sean >