Hive is great for massive transformations needed in ETL type processing and full data set analytics. Impala is better suited for fast analytical queries returning a tiny subset of the original data set. Both are improving in terms of concurrency and latency however they have a long ways to go to beat commercial MPP solutions in terms of performance and stability. Their key advantages are storage economics and flexibility (schema on read).
Sent from my iPhone On Apr 27, 2015, at 6:27 AM, Anilkumar Kalshetti <anilkalshe...@gmail.com<mailto:anilkalshe...@gmail.com>> wrote: Hi Ashok, Also Now you can use spark as execution Engine for Hive. Please check HiveOnSpark[HoS] Project. Ref Link<https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started>. Thanks On 27 April 2015 at 15:22, Fabio C. <anyte...@gmail.com<mailto:anyte...@gmail.com>> wrote: If the comparison mention just MR, then is probably outdated. Hive can now run on Tez with a great improvement in performance. However I don't know about Hive+Tez vs Impala. On Mon, Apr 27, 2015 at 10:50 AM, Nitin Pawar <nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>> wrote: What use case are you trying to solve? On Mon, Apr 27, 2015 at 2:16 PM, Ashok Kumar <ashok34...@yahoo.com<mailto:ashok34...@yahoo.com>> wrote: Hi gurus, Kindly help me understand the advantage that Impala has over Hive. I read a note that Impala does not use MapReduce engine and is therefore very fast for queries compared to Hive. However, Hive as I understand is widely used everywhere! Thank you -- Nitin Pawar