Performance guide for small files ?

2017-01-31 Thread Marco Garcia
Hi Guys, We’re working on Project with a lot of small data in tables like dimension. Project is using Cloudera 5.6, is there any performance guide or best practices that could be adopted ? Tks Marco Garcia

Re: Experimental results using TPC-DS (versus Spark and Presto)

2017-01-31 Thread Prasanth Jayachandran
Hi Dongwon Thanks for the presentation! Very insightful. I just filed a bug for query72. Hive’s CBO seems to be selecting wrong join order. https://issues.apache.org/jira/browse/HIVE-15771 In the following link you can find a rewrite for the query which gives much better runtime (in my testing