Oh, thanks. Make sense to me.
Best,
Sun.
fightf...@163.com
From: Takeshi Yamamuro
Date: 2016-02-04 16:01
To: fightf...@163.com
CC: user
Subject: Re: Re: About cache table performance in spark sql
Hi,
Parquet data are column-wise and highly compressed, so the size of deserialized
rows in
m impala I get the overall
> parquet file size if about 24.59GB. Would be good to had some correction
> on this.
>
> Best,
> Sun.
>
> --
> fightf...@163.com
>
>
> *From:* Prabhu Joseph
> *Date:* 2016-02-04 14:35
> *To:* fightf..
? From impala I get the overall
parquet file size if about 24.59GB. Would be good to had some correction on
this.
Best,
Sun.
fightf...@163.com
From: Prabhu Joseph
Date: 2016-02-04 14:35
To: fightf...@163.com
CC: user
Subject: Re: About cache table performance in spark sql
Sun,
When
Sun,
When Executor don't have enough memory and if it tries to cache the
data, it spends lot of time on GC and hence the job will be slow. Either,
1. We should allocate enough memory to cache all RDD and hence the job
will complete fast
Or 2. Don't use cache when there is not enough Execu
Hi,
I want to make sure that the cache table indeed would accelerate sql queries.
Here is one of my use case :
impala table size : 24.59GB, no partitions, with about 1 billion+ rows.
I use sqlContext.sql to run queries over this table and try to do cache and
uncache command to see if there