Problem using limit clause in spark sql

2015-12-23 Thread
Hi, I am using spark sql in a way like this: sqlContext.sql(“select * from table limit 1”).map(...).collect() The problem is that the limit clause will collect all the 10,000 records into a single partition, resulting the map afterwards running only in one partition and being really slow.I

Re: Problem using limit clause in spark sql

2015-12-23 Thread
It is an application running as an http server. So I collect the data as the response. > 在 2015年12月24日,上午8:22,Hudong Wang 写道: > > When you call collect() it will bring all the data to the driver. Do you mean > to call persist() instead? > > From: tiandiwo...@icloud.com > Subject: Problem usin

Re: Problem using limit clause in spark sql

2015-12-23 Thread
anks. > > Zhan Zhang > > > > On Dec 23, 2015, at 5:32 PM, 汪洋 <mailto:tiandiwo...@icloud.com>> wrote: > >> It is an application running as an http server. So I collect the data as the >> response. >> >>> 在 2015年12月24日,上午8:22,Hudong Wan