Hi, I am a HIVE user who is working on anlytical applications on large data sets. For us, the HIVE performance is critical for the success of our product. I was wondering if there are any recent improvements that were made in the optimizer layer. One of the relevant references I found on the web is the HIVE paper (http://infolab.stanford.edu/~ragho/hive-icde2010.pdf) . If you can send me any pointers on current enhancements, that would be great.
Some specific improvements I am looking for are: 1. Cost based optimization (logical or physical) 2. "multi-query optimization techniques and performing generic n-way joins in a single map-reduce job" (quoted from the future work section of the paper above) 3. Using and generation of table statistics for generation of betterplans/faster execution etc. I know there was some code added to generate column statistics for HIVE tables. Any other statistics generation? Thanks for your help, -Sukhendu