Re: Experimental results using TPC-DS (versus Spark and Presto)

Dongwon Kim Mon, 30 Jan 2017 15:51:07 -0800

Goun : Just to make all the engines use the same data and I usually
store data in ORC. I know that it can make biased results in favor of
Hive. I did Spark experiments with Parquet, and Spark works better
with Parquet as it is believed (not included in the result though).


Goden : Oops, 128GB main memory for the master and all the slaves for
sure because I'm using 80GB per each node.

Gopal : (yarn logs -application $APPID) doesn't contain a line
containing HISTORY so it doesn't produce svg file. Should I turn on
some option to get the lines containing HISTORY in yarn application
log?

2017-01-31 4:47 GMT+09:00 Goden Yao <goden...@apache.org>:
> was the master 128MB or 128GB memory?
>
>
> On Mon, Jan 30, 2017 at 3:24 AM Gopal Vijayaraghavan <gop...@apache.org>
> wrote:
>>
>>
>> > Hive LLAP shows better performance than Presto and Spark for most
>> > queries, but it shows very poor performance on the execution of query 72.
>>
>> My suspicion will be the the inventory x catalog_sales x warehouse join -
>> assuming the column statistics are present and valid.
>>
>> If you could send the explain formatted plans and swimlanes for LLAP, I
>> can probably debug this better.
>>
>>
>> https://github.com/apache/tez/blob/master/tez-tools/swimlanes/yarn-swimlanes.sh
>>
>> Use the "submitted to <appid>" in this to get the diagram.
>>
>> Cheers,
>> Gopal
>>
>>
> --
> Goden

Re: Experimental results using TPC-DS (versus Spark and Presto)

Reply via email to