can you post the explain too ?

Le mar. 12 mai 2015 à 12:11, Jianshi Huang <jianshi.hu...@gmail.com> a
écrit :

> Hi,
>
> I have a SQL query on tables containing big Map columns (thousands of
> keys). I found it to be very slow.
>
> select meta['is_bad'] as is_bad, count(*) as count, avg(nvar['var1']) as
> avg
> from test
> where date between '2014-04-01' and '2014-04-30'
> group by meta['is_bad']
>
> =>
>
> +---------+-----------+-----------------------+
> | is_bad  |   count   |          avg          |
> +---------+-----------+-----------------------+
> | 0       | 17024396  | 0.16257395850742645   |
> | 1       | 179729    | -0.37626256661125485  |
> | 2       | 28128     | 0.11674427263203344   |
> | 3       | 116327    | -0.6398689187187386   |
> | 4       | 87715     | -0.5349632960030563   |
> | 5       | 169771    | 0.40812641191854626   |
> | 6       | 542447    | 0.5238256418341465    |
> | 7       | 160324    | 0.29442847034840386   |
> | 8       | 2099      | -0.9165701665162977   |
> | 9       | 3104      | 0.3845685004598235    |
> +---------+-----------+-----------------------+
> 10 rows selected (130.5 seconds)
>
>
> The total number of rows is less than 20M. Why so slow?
>
> I'm running on Spark 1.4.0-SNAPSHOT with 100 executors each having 4GB ram
> and 2 CPU core.
>
> Looks like https://issues.apache.org/jira/browse/SPARK-5446 is still
> open, when can we have it fixed? :)
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Reply via email to