Re: Experimental results using TPC-DS (versus Spark and Presto)

Goden Yao Mon, 30 Jan 2017 16:55:55 -0800

ORC works well with Presto too at least.
Can you explain a little how you ran 1TB benchmark on a 5*80 = 400GB total
memory in presto cluster.
Did you use compression to fit them all in memory? or partitioned data ,
etc.



On Mon, Jan 30, 2017 at 3:50 PM Dongwon Kim <eastcirc...@gmail.com> wrote:

> Goun : Just to make all the engines use the same data and I usually
> store data in ORC. I know that it can make biased results in favor of
> Hive. I did Spark experiments with Parquet, and Spark works better
> with Parquet as it is believed (not included in the result though).
>
> Goden : Oops, 128GB main memory for the master and all the slaves for
> sure because I'm using 80GB per each node.
>
> Gopal : (yarn logs -application $APPID) doesn't contain a line
> containing HISTORY so it doesn't produce svg file. Should I turn on
> some option to get the lines containing HISTORY in yarn application
> log?
>
> 2017-01-31 4:47 GMT+09:00 Goden Yao <goden...@apache.org>:
> > was the master 128MB or 128GB memory?
> >
> >
> > On Mon, Jan 30, 2017 at 3:24 AM Gopal Vijayaraghavan <gop...@apache.org>
> > wrote:
> >>
> >>
> >> > Hive LLAP shows better performance than Presto and Spark for most
> >> > queries, but it shows very poor performance on the execution of query
> 72.
> >>
> >> My suspicion will be the the inventory x catalog_sales x warehouse join
> -
> >> assuming the column statistics are present and valid.
> >>
> >> If you could send the explain formatted plans and swimlanes for LLAP, I
> >> can probably debug this better.
> >>
> >>
> >>
> https://github.com/apache/tez/blob/master/tez-tools/swimlanes/yarn-swimlanes.sh
> >>
> >> Use the "submitted to <appid>" in this to get the diagram.
> >>
> >> Cheers,
> >> Gopal
> >>
> >>
> > --
> > Goden
>
-- 
Goden

Re: Experimental results using TPC-DS (versus Spark and Presto)

Reply via email to