Hi Radek, I've been using the MiniMRCluster and MiniDFSCluster to run unit tests locally. That has been giving me decent cycle time. I have fixture tables in my test/resources which I can load into the MiniHiveCluster as part of test setup. I loosely based my code on QTestUtil in the Hive trunk (which I could not figure out how to use directly).
Andrew On Feb 18, 2011, at 3:21 PM, Kirk True wrote: > Hi Radek, > > I'm actually in the process of running the map-join unit tests against > EMR as we speak. It's possible but dog slow :) > > Thanks, > Kirk > > On 2/18/11 11:09 AM, Edward Capriolo wrote: >> On Fri, Feb 18, 2011 at 6:58 AM, Radek Maciaszek >> <radek.macias...@gmail.com> wrote: >>> Hello, >>> I was wondering if anyone managed to unit test Hive scripts and share >>> his/her experience? My first thought was to prepare sample data, run hive >>> scripts in order to generate output and then compare the generated output >>> with the expected output. Sounds fairly simple but it may be a bit >>> complicated if the data is read from S3 and stored in S3. >>> I was also wondering if anyone managed to run the tests on EMR? I found this >>> simple framework which may help with testing EMR: >>> http://entxtech.blogspot.com/2010/10/how-to-unit-test-apache-hive-scripts.html >>> However I am tempted to run tests on a real EMR rather than doing it >>> locally. >>> I am planning to integrate those tests with Jenkins (formerly Hudson). >>> Many thanks, >>> Radek >> The process you described of diffing output is exactly how hives >> current unit testing works. It has its upsites being that it is good >> for catching regressions but the download is it is not really >> programatic. Look for .q files in the hive source and their >> corresponding results/q.out files. >> >> Edward