HBase did something similar, so linking that here if it helps you Andrey: https://issues.apache.org/jira/browse/HBASE-4602
On Sat, Jul 28, 2012 at 3:00 AM, Andrey Klochkov <akloch...@griddynamics.com> wrote: > Hello, > > It's quite noticeable that testing hadoop-hdfs and hadoop-mapreduce > (0.23/1.0/2.0) takes a lot of time which has number of obvious > downsides. Me and my team are trying to analyze the reasons and > identify possible improvements, and in particular we noticed that > during last years there were a number of attempts to optimize and > speed up HDFS/MR junit tests, namely: > > 1. Introducing unit test framework > > A number of pure unit tests (mock-based, non-integration) were added, > see HDFS-669, MAPREDUCE-1050, HADOOP-6423. > > However, it seems that these tests are not separated from integration > tests (MiniCluster-based), some of them were moved to the > hadoop-hdfs/src/tests/unit and hadoop-mapreduce-project/src/test/unit > directories and disabled in mavenized builds starting from 0.23. There > was an attempt to fix this in HDFS-2276, but it's still unresolved. > > 2. Smoke tests (10 minutes test target) > > There was a successful initiative on selecting a subset of tests in > HDFS and MapReduce modules to be used as smoke tests with running time > < 10 minutes. The tests were chosen manually, with the condition of > having large code coverage in the most important packages/classes. > This was done prior to 0.23/2.0, in Ant builds, see HADOOP-5628, > HDFS-458, MAPREDUCE-670. > > Apparently, mavenized builds do not use this feature. > > 3. Separating tests into categories. HADOOP-6399 - open since 2009. > > In general, separating tests into categories, having fast true unit > tests additionally to great coverage by integration/component tests > Hadoop has now, and then sets of capacity/availability tests -- those > things would help to make Hadoop more stable, development and release > process less painful etc. > > So would it be useful to do some cleaning, stabilizing and enhancing > existing unit/integration tests, assemble a suite of pure unit tests > and short-running integration tests, having coverage measured for all > three sets (unit, smoke, full). Is it worth pursuing this? What's the > best place to start? Is it worth completing the items 1 and 2 > mentioned above? Any comments or hints would be really appreciated. > > -- > Andrey Klochkov -- Harsh J