Goun : Just to make all the engines use the same data and I usually store data in ORC. I know that it can make biased results in favor of Hive. I did Spark experiments with Parquet, and Spark works better with Parquet as it is believed (not included in the result though).
Goden : Oops, 128GB main memory for the master and all the slaves for sure because I'm using 80GB per each node. Gopal : (yarn logs -application $APPID) doesn't contain a line containing HISTORY so it doesn't produce svg file. Should I turn on some option to get the lines containing HISTORY in yarn application log? 2017-01-31 4:47 GMT+09:00 Goden Yao <goden...@apache.org>: > was the master 128MB or 128GB memory? > > > On Mon, Jan 30, 2017 at 3:24 AM Gopal Vijayaraghavan <gop...@apache.org> > wrote: >> >> >> > Hive LLAP shows better performance than Presto and Spark for most >> > queries, but it shows very poor performance on the execution of query 72. >> >> My suspicion will be the the inventory x catalog_sales x warehouse join - >> assuming the column statistics are present and valid. >> >> If you could send the explain formatted plans and swimlanes for LLAP, I >> can probably debug this better. >> >> >> https://github.com/apache/tez/blob/master/tez-tools/swimlanes/yarn-swimlanes.sh >> >> Use the "submitted to <appid>" in this to get the diagram. >> >> Cheers, >> Gopal >> >> > -- > Goden