Hi Radek,
I'm actually in the process of running the map-join unit tests against
EMR as we speak. It's possible but dog slow :)
Thanks,
Kirk
On 2/18/11 11:09 AM, Edward Capriolo wrote:
On Fri, Feb 18, 2011 at 6:58 AM, Radek Maciaszek
<radek.macias...@gmail.com> wrote:
Hello,
I was wondering if anyone managed to unit test Hive scripts and share
his/her experience? My first thought was to prepare sample data, run hive
scripts in order to generate output and then compare the generated output
with the expected output. Sounds fairly simple but it may be a bit
complicated if the data is read from S3 and stored in S3.
I was also wondering if anyone managed to run the tests on EMR? I found this
simple framework which may help with testing EMR:
http://entxtech.blogspot.com/2010/10/how-to-unit-test-apache-hive-scripts.html
However I am tempted to run tests on a real EMR rather than doing it
locally.
I am planning to integrate those tests with Jenkins (formerly Hudson).
Many thanks,
Radek
The process you described of diffing output is exactly how hives
current unit testing works. It has its upsites being that it is good
for catching regressions but the download is it is not really
programatic. Look for .q files in the hive source and their
corresponding results/q.out files.
Edward