Hello, I was wondering if anyone managed to unit test Hive scripts and share his/her experience? My first thought was to prepare sample data, run hive scripts in order to generate output and then compare the generated output with the expected output. Sounds fairly simple but it may be a bit complicated if the data is read from S3 and stored in S3.
I was also wondering if anyone managed to run the tests on EMR? I found this simple framework which may help with testing EMR: http://entxtech.blogspot.com/2010/10/how-to-unit-test-apache-hive-scripts.html However I am tempted to run tests on a real EMR rather than doing it locally. I am planning to integrate those tests with Jenkins (formerly Hudson). Many thanks, Radek