[ 
https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174292#comment-13174292
 ] 

Alan Gates commented on HIVE-2670:
----------------------------------

Attached a first patch.  This is not ready for inclusion yet, I'm just putting 
it up here to start getting feedback.  The following will need to be resolved 
before it is checked in:
# Currently it just has the base harness code included as a tar file.  This 
really should be externed from the Pig code base, as HCatalog does.
# I don't know if this is the right place in SVN or not.  I put it all in a 
test-e2e directory right under trunk.  I need feedback on whether this is a 
good spot or somewhere else would be preferred.
# Connect the top level build.xml to this so it is possible to invoke the tests 
from the top level directory.  I was waiting to do this until I had feedback on 
the proper directory structure.

How to use it:

After applying the patch you will need to copy the harness.tar file (attached) 
to test-e2e, since that is not done for you by the patch tool.

First you need an existing Hadoop cluster (it can be very small, just a few 
nodes) and a MySQL database.  I ran my tests against Hadoop 0.20.205.0, but 
this should run against any 0.20.x version of Hadoop.  Then:
# Run the script test-e2e/scripts/create_test_db.sql against your MySQL 
database as a user that can create users and databases, and grant to users 
(root is a good choice)
# Run "ant package" in the top level Hive directory
# cd test-e2e
# ant -Dharness.hadoop.home=<path_to_hadoop_home> 
-Dharness.hive.home=<path_to_hive_you_want_to_test> deploy
# ant -Dharness.hadoop.home=<path_to_hadoop_home> 
-Dharness.hive.home=<path_to_hive_you_want_to_test> deploy

Usually <path_to_hive_you_want_to_test> will be $CWD/../build/dist

The basic design of this test harness is each test consists of three phases:  
run_test, generate_benchmark, and compare_results.  In run_test a particular 
test is run.  generate_benchmark runs the same or a similar test against a 
known source of truth.  compare_results then compares the results and declares 
the test to have succeeded, failed, or aborted.  The harness delegates each of 
these three functions to drivers that are specific to different types of tests.

This patch includes two drivers, a Hive driver and a Hive command line driver.  
The Hive driver uses the MySQL database as a source of truth.  Each SQL script 
is run against Hive and against MySQL and the results compared using the Unix 
cksum tool.  

For more information on the test harness, including how to add tests to it, see 
https://cwiki.apache.org/confluence/display/PIG/HowToTest  The Hive driver does 
not yet support running alternate SQL for benchmarking nor using an old version 
of Hive for the benchmarks, though those should be added sometime.

                
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>         Attachments: harness.tar, hive_cluster_test.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an 
> infrastructure for testing in a cluster environment.  Pig and HCatalog have 
> been using a test harness for cluster testing for some time.  We have written 
> Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to