[ https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated HIVE-12316: ------------------------------ Attachment: HIVE-12316.patch An initial patch. Apologies as I know this is large and a lot to absorb. I did try to be exhaustive in the javadoc, which covers both the design and the usage. > Improved integration test for Hive > ---------------------------------- > > Key: HIVE-12316 > URL: https://issues.apache.org/jira/browse/HIVE-12316 > Project: Hive > Issue Type: New Feature > Components: Testing Infrastructure > Affects Versions: 2.0.0 > Reporter: Alan Gates > Assignee: Alan Gates > Attachments: HIVE-12316.patch > > > In working with Hive testing I have found there are several issues that are > causing problems for developers, testers, and users: > * Because Hive has many tunable knobs (file format, security, etc.) we end up > with tests that cover the same functionality with different permutations of > these features. > * The Hive integration tests (ie qfiles) cannot be run on a cluster. This > means we cannot run any of those tests at scale. The HBase community by > contrast uses the same test suite locally and on a cluster, and has found > that this helps them greatly in testing. > * Golden files are a grievous evil. Test writers are forced to eyeball > results the first time they run a test and decide whether they look > reasonable, which is error prone and makes testing at scale impossible. And > changes to one part of Hive often end up changing the plan (and the output of > explain) thus breaking many tests that are not related. This is particularly > an issue for people working on the optimizer. > * The lack of ability to run on a cluster means that when people test Hive at > scale, they are forced to develop custom frameworks which can't then benefit > the community. > * There is no easy mechanism to bring user queries into the test suite. > I propose we build a new testing capability with the following requirements: > * One test should be able to run all reasonable permutations (mr/tez/spark, > orc/parquet/text/rcfile, secure/non-secure etc.) This doesn't mean it would > run every permutation every time, but that the tester could choose which > permutation to run. > * The same tests should run locally and on a cluster. The tests should > support scaling of input data from Ks to Ts. > * Expected results should be auto-generated whenever possible, and this > should work with the scaling of inputs. The dev should be able to provide > expected results or custom expected result generation in cases where > auto-generation doesn't make sense. > * Access to the query plan should be available as an API in the tests so that > golden files of explain output are not required. > * This should run in maven, junit, and java so that developers do not need to > manage yet another framework. > * It should be possible to simulate user data (based on schema and > statistics) and quickly incorporate user queries so that tests from user > scenarios can be quickly incorporated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)