ORC works well with Presto too at least. Can you explain a little how you ran 1TB benchmark on a 5*80 = 400GB total memory in presto cluster. Did you use compression to fit them all in memory? or partitioned data , etc.
On Mon, Jan 30, 2017 at 3:50 PM Dongwon Kim <eastcirc...@gmail.com> wrote: > Goun : Just to make all the engines use the same data and I usually > store data in ORC. I know that it can make biased results in favor of > Hive. I did Spark experiments with Parquet, and Spark works better > with Parquet as it is believed (not included in the result though). > > Goden : Oops, 128GB main memory for the master and all the slaves for > sure because I'm using 80GB per each node. > > Gopal : (yarn logs -application $APPID) doesn't contain a line > containing HISTORY so it doesn't produce svg file. Should I turn on > some option to get the lines containing HISTORY in yarn application > log? > > 2017-01-31 4:47 GMT+09:00 Goden Yao <goden...@apache.org>: > > was the master 128MB or 128GB memory? > > > > > > On Mon, Jan 30, 2017 at 3:24 AM Gopal Vijayaraghavan <gop...@apache.org> > > wrote: > >> > >> > >> > Hive LLAP shows better performance than Presto and Spark for most > >> > queries, but it shows very poor performance on the execution of query > 72. > >> > >> My suspicion will be the the inventory x catalog_sales x warehouse join > - > >> assuming the column statistics are present and valid. > >> > >> If you could send the explain formatted plans and swimlanes for LLAP, I > >> can probably debug this better. > >> > >> > >> > https://github.com/apache/tez/blob/master/tez-tools/swimlanes/yarn-swimlanes.sh > >> > >> Use the "submitted to <appid>" in this to get the diagram. > >> > >> Cheers, > >> Gopal > >> > >> > > -- > > Goden > -- Goden