What I typically do is use row_number & subquery to filter based on that. It works out pretty well, reduces the iteration. I think a offset solution based on windowsing directly would be useful.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Fri, Jul 4, 2014 at 2:00 AM, Michael Armbrust <mich...@databricks.com> wrote: > Doing an offset is actually pretty expensive in a distributed query > engine, so in many cases it probably makes sense to just collect and then > perform the offset as you are doing now. This is unless the offset is very > large. > > Another limitation here is that HiveQL does not support OFFSET. That said > if you want to open a JIRA we would consider implementing it. > > > On Wed, Jul 2, 2014 at 1:37 PM, durin <m...@simon-schaefer.net> wrote: > >> Hi, >> >> in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause, >> s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5. >> >> As far as I can see, this is not possible in Spark-SQL. >> The best solution I have to imitate that (using Scala) is converting the >> RDD >> into an Array via collect() and then using a for-loop to return certain >> elements from that Array. >> >> >> >> >> Is there a better solution regarding performance and are there plans to >> implement an offset for LIMIT? >> >> >> Kind regards, >> Simon >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >