Hi Radek,

I've been using the MiniMRCluster and MiniDFSCluster to run unit tests locally. 
That has been giving me decent cycle time. I have fixture tables in my 
test/resources which I can load into the MiniHiveCluster as part of test setup. 
I loosely based my code on QTestUtil in the Hive trunk (which I could not 
figure out how to use directly).

Andrew

On Feb 18, 2011, at 3:21 PM, Kirk True wrote:

> Hi Radek,
> 
> I'm actually in the process of running the map-join unit tests against 
> EMR as we speak. It's possible but dog slow :)
> 
> Thanks,
> Kirk
> 
> On 2/18/11 11:09 AM, Edward Capriolo wrote:
>> On Fri, Feb 18, 2011 at 6:58 AM, Radek Maciaszek
>> <radek.macias...@gmail.com>  wrote:
>>> Hello,
>>> I was wondering if anyone managed to unit test Hive scripts and share
>>> his/her experience? My first thought was to prepare sample data, run hive
>>> scripts in order to generate output and then compare the generated output
>>> with the expected output. Sounds fairly simple but it may be a bit
>>> complicated if the data is read from S3 and stored in S3.
>>> I was also wondering if anyone managed to run the tests on EMR? I found this
>>> simple framework which may help with testing EMR:
>>> http://entxtech.blogspot.com/2010/10/how-to-unit-test-apache-hive-scripts.html
>>> However I am tempted to run tests on a real EMR rather than doing it
>>> locally.
>>> I am planning to integrate those tests with Jenkins (formerly Hudson).
>>> Many thanks,
>>> Radek
>> The process you described of diffing output is exactly how hives
>> current unit testing works. It has its upsites being that it is good
>> for catching regressions but the download is it is not really
>> programatic. Look for .q files in the hive source and their
>> corresponding results/q.out files.
>> 
>> Edward

Reply via email to