thank you for making a more digestible version Allen. :) If you're interested in soliciting feedback from other projects, I created ASF short links to this thread in common-dev and hbase:
* http://s.apache.org/yetus-discuss-hadoop * http://s.apache.org/yetus-discuss-hbase While I agree that it's important to get feedback from ASF projects that might find this useful, I can say that recently I've been involved in the non-ASF project YCSB and both the pretest and better shell stuff would be immensely useful over there. On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer <a...@altiscale.com> wrote: > > I'm clearly +1 on this idea. As part of the rewrite in Hadoop of > test-patch, it was amazing to see how far and wide this bit of code as > spread. So I see consolidating everyone's efforts as a huge win for a > large number of projects. (esp considering how many I saw suffering from a > variety of identified bugs! ) > > But…. > > I think it's important for people involved in those other projects > to speak up and voice an opinion as to whether this is useful. > > To summarize: > > In the short term, a single location to get/use a precommit patch > tester rather than everyone building/supporting their own in their spare > time. > > FWIW, we've already got the code base modified to be pluggable. > We've written some basic/simple plugins that support Hadoop, HBase, Tajo, > Tez, Pig, and Flink. For HBase and Flink, this does include their custom > checks. Adding support for other project shouldn't be hard. Simple > projects take almost no time after seeing the basic pattern. > > I think it's worthwhile highlighting that means support for both > JIRA and GitHub as well as Ant and Maven from the same code base. > > Longer term: > > Well, we clearly have ideas of things that we want to do. Adding > more features to test-patch (review board? gradle?) is obvious. But what > about teasing apart and generalizing some of the other shell bits from > projects? A common library for building CLI tools to fault injection to > release documentation creation tools to … I'd even like to see us get as > advanced as a "run this program to auto-generate daemon stop/start bits". > > I had a few chats with people about this idea at Hadoop Summit. > What's truly exciting are the ideas that people had once they realized what > kinds of problems we're trying to solve. It's always amazing the problems > that projects have that could be solved by these types of solutions. Let's > stop hiding our cool toys in this area. > > So, what feedback and ideas do you have in this area? Are you a > yay or a nay? > > > On Jun 15, 2015, at 4:47 PM, Sean Busbey <bus...@cloudera.com> wrote: > > > Oof. I had meant to push on this again but life got in the way and now > the > > June board meeting is upon us. Sorry everyone. In the event that this > ends > > up contentious, hopefully one of the copied communities can give us a > > branch to work in. > > > > I know everyone is busy, so here's the short version of this email: I'd > > like to move some of the code currently in Hadoop (test-patch) into a new > > TLP focused on QA tooling. I'm not sure what the best format for priming > > this conversation is. ORC filled in the incubator project proposal > > template, but I'm not sure how much that confused the issue. So to start, > > I'll just write what I'm hoping we can accomplish in general terms here. > > > > All software development projects that are community based (that is, > > accepting outside contributions) face a common QA problem for vetting > > in-coming contributions. Hadoop is fortunate enough to be sufficiently > > popular that the weight of the problem drove tool development (i.e. > > test-patch). That tool is generalizable enough that a bunch of other TLPs > > have adopted their own forks. Unfortunately, in most projects this kind > of > > QA work is an enabler rather than a primary concern, so often the tooling > > is worked on ad-hoc and little shared improvements happen across > > projects. Since > > the tooling itself is never a primary concern, any made is rarely reused > > outside of ASF projects. > > > > Over the last couple months a few of us have been working on generalizing > > the tooling present in the Hadoop code base (because it was the most > mature > > out of all those in the various projects) and it's reached a point where > we > > think we can start bringing on other downstream users. This means we need > > to start establishing things like a release cadence and to grow the new > > contributors we have to handle more project responsibility. Personally, I > > think that means it's time to move out from under Hadoop to drive things > as > > our own community. Eventually, I hope the community can help draw in a > > group of folks traditionally underrepresented in ASF projects, namely QA > > and operations folks. > > > > I think test-patch by itself has enough scope to justify a project. > Having > > a solid set of build tools that are customizable to fit the norms of > > different software communities is a bunch of work. Making it work well in > > both the context of automated test systems like Jenkins and for > individual > > developers is even more work. We could easily also take over maintenance > of > > things like shelldocs, since test-patch is the primary consumer of that > > currently but it's generally useful tooling. > > > > In addition to test-patch, I think the proposed project has some future > > growth potential. Given some adoption of test-patch to prove utility, the > > project could build on the ties it makes to start building tools to help > > projects do their own longer-run testing. Note that I'm talking about the > > tools to build QA processes and not a particular set of tested > components. > > Specifically, I think the ChaosMonkey work that's in HBase should be > > generalizable as a fault injection framework (either based on that code > or > > something like it). Doing this for arbitrary software is obviously very > > difficult, and a part of easing that will be to make (and then favor) > > tooling to allow projects to have operational glue that looks the same. > > Namely, the shell work that's been done in hadoop-functions.sh would be a > > great foundational layer that could bring good daemon handling practices > to > > a whole slew of software projects. In the event that these frameworks and > > tools get adopted by parts of the Hadoop ecosystem, that could make the > job > > of i.e. Bigtop substantially easier. > > > > I've reached out to a few folks who have been involved in the current > > test-patch work or expressed interest in helping out on getting it used > in > > other projects. Right now, the proposed PMC would be (alphabetical by > last > > name): > > > > * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds > > pmc, sqoop pmc, all around Jenkins expert) > > * Sean Busbey (ASF member, accumulo pmc, hbase pmc) > > * Nick Dimiduk (hbase pmc, phoenix pmc) > > * Chris Nauroth (ASF member, incubator pmc, hadoop pmc) > > * Andrew Purtell (ASF member, incubator pmc, bigtop pmc, hbase pmc, > > phoenix pmc) > > * Allen Wittenauer (hadoop committer) > > > > That PMC gives us several members and a bunch of folks familiar with the > > ASF. Combined with the code already existing in Apache spaces, I think > that > > gives us sufficient justification for a direct board proposal. > > > > The planned project name is "Apache Yetus". It's an archaic genus of sea > > snail and most of our project will be focused on shell scripts. > > > > N.b.: this does not mean that the Hadoop community would _have_ to rely > on > > the new TLP, but I hope that once we have a release that can be evaluated > > there'd be enough benefit to strongly encourage it. > > > > This has mostly been focused on scope and community issues, and I'd love > to > > talk through any feedback on that. Additionally, are there any other > points > > folks want to make sure are covered before we have a resolution? > > > > On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <bus...@cloudera.com> > wrote: > > > >> Sorry for the resend. I figured this deserves a [DISCUSS] flag. > >> > >> > >> > >> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <bus...@cloudera.com> > wrote: > >> > >>> Hi Folks! > >>> > >>> After working on test-patch with other folks for the last few months, I > >>> think we've reached the point where we can make the fastest progress > >>> towards the goal of a general use pre-commit patch tester by spinning > >>> things into a project focused on just that. I think we have a mature > enough > >>> code base and a sufficient fledgling community, so I'm going to put > >>> together a tlp proposal. > >>> > >>> Thanks for the feedback thus far from use within Hadoop. I hope we can > >>> continue to make things more useful. > >>> > >>> -Sean > >>> > >>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <bus...@cloudera.com> > wrote: > >>> > >>>> HBase's dev-support folder is where the scripts and support files > live. > >>>> We've only recently started adding anything to the maven builds that's > >>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's > where I'd > >>>> add in more if we ran into the same permissions problems y'all are > having. > >>>> > >>>> There's also our precommit job itself, though it isn't large[2]. > AFAIK, > >>>> we don't properly back this up anywhere, we just notify each other of > >>>> changes on a particular mail thread[3]. > >>>> > >>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687 > >>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're > all > >>>> read because I just finished fixing "mvn site" running out of permgen) > >>>> [3]: http://s.apache.org/NT0 > >>>> > >>>> > >>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth < > cnaur...@hortonworks.com > >>>>> wrote: > >>>> > >>>>> Sure, thanks Sean! Do we just look in the dev-support folder in the > >>>>> HBase > >>>>> repo? Is there any additional context we need to be aware of? > >>>>> > >>>>> Chris Nauroth > >>>>> Hortonworks > >>>>> http://hortonworks.com/ > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <bus...@cloudera.com> wrote: > >>>>> > >>>>>> +dev@hbase > >>>>>> > >>>>>> HBase has recently been cleaning up our precommit jenkins jobs to > make > >>>>>> them > >>>>>> more robust. From what I can tell our stuff started off as an > earlier > >>>>>> version of what Hadoop uses for testing. > >>>>>> > >>>>>> Folks on either side open to an experiment of combining our > precommit > >>>>>> check > >>>>>> tooling? In principle we should be looking for the same kinds of > >>>>> things. > >>>>>> > >>>>>> Naturally we'll still need different jenkins jobs to handle > different > >>>>>> resource needs and we'd need to figure out where stuff eventually > >>>>> lives, > >>>>>> but that could come later. > >>>>>> > >>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth < > >>>>> cnaur...@hortonworks.com> > >>>>>> wrote: > >>>>>> > >>>>>>> The only thing I'm aware of is the failOnError option: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro > >>>>>>> rs > >>>>>>> .html > >>>>>>> > >>>>>>> > >>>>>>> I prefer that we don't disable this, because ignoring different > >>>>> kinds of > >>>>>>> failures could leave our build directories in an indeterminate > state. > >>>>>>> For > >>>>>>> example, we could end up with an old class file on the classpath > for > >>>>>>> test > >>>>>>> runs that was supposedly deleted. > >>>>>>> > >>>>>>> I think it's worth exploring Eddy's suggestion to try simulating > >>>>> failure > >>>>>>> by placing a file where the code expects to see a directory. That > >>>>> might > >>>>>>> even let us enable some of these tests that are skipped on Windows, > >>>>>>> because Windows allows access for the owner even after permissions > >>>>> have > >>>>>>> been stripped. > >>>>>>> > >>>>>>> Chris Nauroth > >>>>>>> Hortonworks > >>>>>>> http://hortonworks.com/ > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu> > wrote: > >>>>>>> > >>>>>>>> Is there a maven plugin or setting we can use to simply remove > >>>>>>>> directories that have no executable permissions on them? Clearly > we > >>>>>>>> have the permission to do this from a technical point of view > (since > >>>>>>>> we created the directories as the jenkins user), it's simply that > >>>>> the > >>>>>>>> code refuses to do it. > >>>>>>>> > >>>>>>>> Otherwise I guess we can just fix those tests... > >>>>>>>> > >>>>>>>> Colin > >>>>>>>> > >>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote: > >>>>>>>>> Thanks a lot for looking into HDFS-7722, Chris. > >>>>>>>>> > >>>>>>>>> In HDFS-7722: > >>>>>>>>> TestDataNodeVolumeFailureXXX tests reset data dir permissions in > >>>>>>>>> TearDown(). > >>>>>>>>> TestDataNodeHotSwapVolumes reset permissions in a finally clause. > >>>>>>>>> > >>>>>>>>> Also I ran mvn test several times on my machine and all tests > >>>>> passed. > >>>>>>>>> > >>>>>>>>> However, since in DiskChecker#checkDirAccess(): > >>>>>>>>> > >>>>>>>>> private static void checkDirAccess(File dir) throws > >>>>>>> DiskErrorException { > >>>>>>>>> if (!dir.isDirectory()) { > >>>>>>>>> throw new DiskErrorException("Not a directory: " > >>>>>>>>> + dir.toString()); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> checkAccessByFileMethods(dir); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> One potentially safer alternative is replacing data dir with a > >>>>>>> regular > >>>>>>>>> file to stimulate disk failures. > >>>>>>>>> > >>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth > >>>>>>>>> <cnaur...@hortonworks.com> wrote: > >>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, > >>>>>>>>>> TestDataNodeVolumeFailureReporting, and > >>>>>>>>>> TestDataNodeVolumeFailureToleration all remove executable > >>>>>>> permissions > >>>>>>>>>> from > >>>>>>>>>> directories like the one Colin mentioned to simulate disk > >>>>> failures > >>>>>>> at > >>>>>>>>>> data > >>>>>>>>>> nodes. I reviewed the code for all of those, and they all > appear > >>>>>>> to be > >>>>>>>>>> doing the necessary work to restore executable permissions at > the > >>>>>>> end > >>>>>>>>>> of > >>>>>>>>>> the test. The only recent uncommitted patch I¹ve seen that > makes > >>>>>>>>>> changes > >>>>>>>>>> in these test suites is HDFS-7722. That patch still looks fine > >>>>>>>>>> though. I > >>>>>>>>>> don¹t know if there are other uncommitted patches that changed > >>>>> these > >>>>>>>>>> test > >>>>>>>>>> suites. > >>>>>>>>>> > >>>>>>>>>> I suppose it¹s also possible that the JUnit process unexpectedly > >>>>>>> died > >>>>>>>>>> after removing executable permissions but before restoring them. > >>>>>>> That > >>>>>>>>>> always would have been a weakness of these test suites, > >>>>> regardless > >>>>>>> of > >>>>>>>>>> any > >>>>>>>>>> recent changes. > >>>>>>>>>> > >>>>>>>>>> Chris Nauroth > >>>>>>>>>> Hortonworks > >>>>>>>>>> http://hortonworks.com/ > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hey Colin, > >>>>>>>>>>> > >>>>>>>>>>> I asked Andrew Bayer, who works with Apache Infra, what's going > >>>>> on > >>>>>>> with > >>>>>>>>>>> these boxes. He took a look and concluded that some perms are > >>>>> being > >>>>>>>>>>> set in > >>>>>>>>>>> those directories by our unit tests which are precluding those > >>>>> files > >>>>>>>>>>> from > >>>>>>>>>>> getting deleted. He's going to clean up the boxes for us, but > we > >>>>>>> should > >>>>>>>>>>> expect this to keep happening until we can fix the test in > >>>>> question > >>>>>>> to > >>>>>>>>>>> properly clean up after itself. > >>>>>>>>>>> > >>>>>>>>>>> To help narrow down which commit it was that started this, > Andrew > >>>>>>> sent > >>>>>>>>>>> me > >>>>>>>>>>> this info: > >>>>>>>>>>> > >>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- > >>>>>>> > >>>>> > >>>>>>>>>>> > Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3 > >>>>>>>>>>> / > >>>>>>>>>>> has > >>>>>>>>>>> 500 perms, so I'm guessing that's the problem. Been that way > >>>>> since > >>>>>>> 9:32 > >>>>>>>>>>> UTC > >>>>>>>>>>> on March 5th." > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Aaron T. Myers > >>>>>>>>>>> Software Engineer, Cloudera > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe > >>>>>>> <cmcc...@apache.org> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> A very quick (and not thorough) survey shows that I can't find > >>>>> any > >>>>>>>>>>>> jenkins jobs that succeeded from the last 24 hours. Most of > >>>>> them > >>>>>>>>>>>> seem > >>>>>>>>>>>> to be failing with some variant of this message: > >>>>>>>>>>>> > >>>>>>>>>>>> [ERROR] Failed to execute goal > >>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean > >>>>>>> (default-clean) > >>>>>>>>>>>> on project hadoop-hdfs: Failed to clean project: Failed to > >>>>> delete > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>> > >>>>> > >>>>>>>>>>>> > /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd > >>>>>>>>>>>> fs > >>>>>>>>>>>> -pr > >>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3 > >>>>>>>>>>>> -> [Help 1] > >>>>>>>>>>>> > >>>>>>>>>>>> Any ideas how this happened? Bad disk, unit test setting > wrong > >>>>>>>>>>>> permissions? > >>>>>>>>>>>> > >>>>>>>>>>>> Colin > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Lei (Eddy) Xu > >>>>>>>>> Software Engineer, Cloudera > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Sean > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Sean > >>>> > >>> > >>> > >>> > >>> -- > >>> Sean > >>> > >> > >> > >> > >> -- > >> Sean > >> > > > > > > > > -- > > Sean > > -- Sean