subject:"Re\: Problem using limit clause in spark sql"

Re: Problem using limit clause in spark sql

2015-12-26 Thread tiandiwoxin1234

As for 'rdd.zipwithIndex.partitionBy(YourCustomPartitioner)', can I just drop some records using my custom partitioner, otherwise I still have to call rdd.take() to get exactly 1 records. And repartition is THE expensive operation that I want to walk around. Actually, what I expect the limi

Re: Problem using limit clause in spark sql

2015-12-25 Thread manasdebashiskar

It can be easily done using an RDD. rdd.zipwithIndex.partitionBy(YourCustomPartitioner) should give you your items. Here YourCustomPartitioner will know how to pick sample items from each partition. If you want to stick to Dataframe you can always repartition the data after you apply the limit.

Re: Problem using limit clause in spark sql

2015-12-23 Thread Gaurav Agarwal

I am going to have the above scenario without using limit clause then will it work check among all the partitions. On Dec 24, 2015 9:26 AM, "汪洋" wrote: > I see. > > Thanks. > > > 在 2015年12月24日，上午11:44，Zhan Zhang 写道： > > There has to have a central point to collaboratively collecting exactly > 10

Re: Problem using limit clause in spark sql

2015-12-23 Thread 汪洋

I see. Thanks. > 在 2015年12月24日，上午11:44，Zhan Zhang 写道： > > There has to have a central point to collaboratively collecting exactly 1 > records, currently the approach is using one single partitions, which is easy > to implement. > Otherwise, the driver has to count the number of record

Re: Problem using limit clause in spark sql

2015-12-23 Thread Zhan Zhang

There has to have a central point to collaboratively collecting exactly 1 records, currently the approach is using one single partitions, which is easy to implement. Otherwise, the driver has to count the number of records in each partition and then decide how many records to be materialize

Re: Problem using limit clause in spark sql

2015-12-23 Thread 汪洋

It is an application running as an http server. So I collect the data as the response. > 在 2015年12月24日，上午8:22，Hudong Wang 写道： > > When you call collect() it will bring all the data to the driver. Do you mean > to call persist() instead? > > From: tiandiwo...@icloud.com > Subject: Problem usin

Re: Problem using limit clause in spark sql

Re: Problem using limit clause in spark sql

Re: Problem using limit clause in spark sql

Re: Problem using limit clause in spark sql

Re: Problem using limit clause in spark sql

Re: Problem using limit clause in spark sql

6 matches

Site Navigation

Mail list logo

Footer information