On Fri, Feb 18, 2011 at 6:58 AM, Radek Maciaszek <radek.macias...@gmail.com> wrote: > Hello, > I was wondering if anyone managed to unit test Hive scripts and share > his/her experience? My first thought was to prepare sample data, run hive > scripts in order to generate output and then compare the generated output > with the expected output. Sounds fairly simple but it may be a bit > complicated if the data is read from S3 and stored in S3. > I was also wondering if anyone managed to run the tests on EMR? I found this > simple framework which may help with testing EMR: > http://entxtech.blogspot.com/2010/10/how-to-unit-test-apache-hive-scripts.html > However I am tempted to run tests on a real EMR rather than doing it > locally. > I am planning to integrate those tests with Jenkins (formerly Hudson). > Many thanks, > Radek
The process you described of diffing output is exactly how hives current unit testing works. It has its upsites being that it is good for catching regressions but the download is it is not really programatic. Look for .q files in the hive source and their corresponding results/q.out files. Edward