+1 on the proposal. Adding more to it. A lot of time has been spent on improving the test runtime and bringing down the flaky tests. Following jiras should give an overview of the effort involved https://issues.apache.org/jira/browse/HIVE-14547 https://issues.apache.org/jira/browse/HIVE-13503
Committers please ensure that the reported failures are absolutely not related to the patch before committing it. I would also propose the following to maintain a clean and some tips to maintain fast test runs 1) Revert patch that is causing a failure. It should be the responsibility of the contributor to make sure the patch is not causing any failures. I am against creating follow ups for fixing test failures usually because it gets ignored or it gets lower priority causing wasted effort and time for failure analysis for every other developers waiting to commit their patch. 2) +1 from reviewers AFTER a clean green run. Or if a reviewer is convinced that test failures are unrelated. May be we should stop conditional +1s and wait for clean green run. 3) Avoid slow tests (there is jira to print out runtime of newly added tests). In general, its good to have many smaller tests as opposed to single big tests. If the qfile or junit test is big, splitting it up will help in parallelizing them and avoiding stragglers. 4) Avoid adding tests to MiniMr (slowest of all). 5) Try to keep the test runtime (see surefire-report to get correct runtime without initialization time) under a minute. 6) Avoid adding more read-only tables to init script as this will increase the initialization time. 7) If the test case does not require explain plan then avoid it as most failures are explain diffs. 8) If the test case requires explain and if it does not depend on table or partition stats explicitly set stats for the table or partition. Explicitly setting stats will avoid expensive stats computation time and avoids flakiness due to stats diff. 9) Prefer jUnit over qtest. 10) Add explicitly timeout for jUnit test to avoid indefinite hanging of tests (surefire timeouts after 40 mins) Thoughts? Thanks Prasanth On Oct 13, 2016, at 11:10 PM, Siddharth Seth <ss...@apache.org<mailto:ss...@apache.org>> wrote: There's been a lot of work to make the test runs faster, as well as more reliable via HIVE-14547, HIVE-13503, and several other jiras. Test runtimes are around the 1 hour mark, and going down. There were a few green pre-commit runs (after years?). At the same time, there's still some flaky tests. We really should try to keep the test runtimes down, as well as the number of failures - so that the pre-commit runs can provide useful information. I'm not sure what the current approach w.r.t precommit runs before a commit. What I've seen in other projects is that the pre-commit needs to run, and come back clean (mostly) before a commit goes in. Between what used to be 5 day wait times, and inconsistent runs - I don't think this is always followed in Hive. It'll be useful to start relying on pre-commit test results again. Given the flaky tests, I'd suggest the following 1. Pre-commit must be run on a patch before committing (with very few exceptions) 2. A green test run is ideal 3. In case there are failures - keep track of these as sub-jiras under a flaky test umbrella jira (Some under HIVE-14547 already) - to be eventually fixed. 4. Before committing - cite relevant jiras for a flaky test (create and cite if it doesn't already exist). This should help us build up a list of flaky tests over various runs, which will hopefully get fixed at some point. Thoughts? Thanks, Sid