All, I want to start a discussion about future approaches to perform Hadoop system (and potentially other types of) testing in 0.22 and later.
As many of you know recent development effort from a number of Hadoop developers brought to the existence new system test framework codename Herriot. If you never hear about it please check HADOOP-6332 and http://wiki.apache.org/hadoop/HowToUseSystemTestFramework Now, Herriot is a great tool which allows for much wider and powerful inspection and interventions of/into remote Hadoop's daemons (aka observability andjcontrollability). There's a catch, however, for such powers come at the costs of a build instrumentation. On the other hand, there's a fairly large number of cases where no introspection into daemons' internals is required. These can be carried by a simple communication via Hadoop CLI. To name a few: testing ACL refreshes, basic file ops, etc. However, there's a lack of any common understanding yet agreement on how this might be performed. I'd like to start the conversation which will, hopefully, let's us work out some tactics. I can see three possible approaches (might be more and I just don't see them?): 1) adding special mode to Herriot to work with non-instrumented clusters. In such a mode (let's call it 'standard' for now) the framework will have only reduced functionality such as: - start/stop a remote daemon - change/push a daemon configuration - simple(-ier) interfaces to HDFS via DFSClient - simple(-ier) interface to work with MR - (the list might be extended apparently) 2) Groovy (or even bash) front-end for system tests. The latter is pretty poor, in my opinion, because unlike Groovy Unix shell won't provide abilities to work with public Hadoop (Java) APIs directly. Groovy, on the other hand, is much more expressive than Java; it's highly dynamic, and provides MOP among other things. (Please, let's not start a discussion about Groovy vs. Scala here!) 3) Creating custom SSH-based command executors on top of CLITestHelper and then reusing the rest of that infrastructure to create tests similar to TestCLI. My ultimate goals is to, essentially, has a single uniformed test driver/framework (such as JUnit) to control all/most types of tests execution starting at the TUT (true unit tests end) up to the system and, potentially, load tests. One of the benefits such approach will provide is to facilitate integration of other types of testing into CI infrastructure (read Hudson) and will provide well-supported and familiar for many test development environment, lowering the learning curve for potential contributors who might want to join Hadoop community and helps us to make Hadoop even better product. -- With best regards, Konstantin Boudnik (aka Cos) A212 4206 7EC6 F8BF 20E6 7C37 32A5 E27E 4C03 A1A1 Attention! Streams of consciousness are disallowed Cos' pubkey: http://people.apache.org/~cos/cos.asc