As for 'rdd.zipwithIndex.partitionBy(YourCustomPartitioner)', can I just drop 
some records using my custom partitioner, otherwise I still have to call 
rdd.take() to get exactly 10000 records.

And repartition is THE expensive operation that I want to walk around.

Actually, what I expect the limit clause would do is using some kind of 
coordinator to assign each partition a number of records to reserve and the sum 
of which is exactly the limit(or ). But it seems this cannot be easily done.  

> 在 2015年12月25日,下午11:10,manasdebashiskar [via Apache Spark User List] 
> <ml-node+s1001560n25797...@n3.nabble.com> 写道:
> 
> It can be easily done using an RDD. 
> 
> rdd.zipwithIndex.partitionBy(YourCustomPartitioner) should give you your 
> items. 
> Here YourCustomPartitioner will know how to pick sample items from each 
> partition. 
> 
> If you want to stick to Dataframe you can always repartition the data after 
> you apply the limit. 
> 
> ..Manas 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Problem-using-limit-clause-in-spark-sql-tp25789p25797.html
>  
> <http://apache-spark-user-list.1001560.n3.nabble.com/Problem-using-limit-clause-in-spark-sql-tp25789p25797.html>
> To unsubscribe from Problem using limit clause in spark sql, click here 
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=25789&code=dGlhbmRpd294aW5AaWNsb3VkLmNvbXwyNTc4OXwtOTkyODc3MDI5>.
> NAML 
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-using-limit-clause-in-spark-sql-tp25789p25798.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to