Performance guide for small files ?
Hi Guys, We’re working on Project with a lot of small data in tables like dimension. Project is using Cloudera 5.6, is there any performance guide or best practices that could be adopted ? Tks Marco Garcia
Re: Experimental results using TPC-DS (versus Spark and Presto)
Hi Dongwon Thanks for the presentation! Very insightful. I just filed a bug for query72. Hive’s CBO seems to be selecting wrong join order. https://issues.apache.org/jira/browse/HIVE-15771 In the following link you can find a rewrite for the query which gives much better runtime (in my testing