Hi,
I am using spark sql in a way like this:
sqlContext.sql(“select * from table limit 1”).map(...).collect()
The problem is that the limit clause will collect all the 10,000 records into a
single partition, resulting the map afterwards running only in one partition
and being really slow.I
It is an application running as an http server. So I collect the data as the
response.
> 在 2015年12月24日,上午8:22,Hudong Wang 写道:
>
> When you call collect() it will bring all the data to the driver. Do you mean
> to call persist() instead?
>
> From: tiandiwo...@icloud.com
> Subject: Problem usin
anks.
>
> Zhan Zhang
>
>
>
> On Dec 23, 2015, at 5:32 PM, 汪洋 <mailto:tiandiwo...@icloud.com>> wrote:
>
>> It is an application running as an http server. So I collect the data as the
>> response.
>>
>>> 在 2015年12月24日,上午8:22,Hudong Wan