Though I'll note that window functions are not yet supported in Spark SQL. https://issues.apache.org/jira/browse/SPARK-1442
On Fri, Jul 4, 2014 at 6:59 AM, Mayur Rustagi <mayur.rust...@gmail.com> wrote: > What I typically do is use row_number & subquery to filter based on that. > It works out pretty well, reduces the iteration. I think a offset solution > based on windowsing directly would be useful. > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Fri, Jul 4, 2014 at 2:00 AM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Doing an offset is actually pretty expensive in a distributed query >> engine, so in many cases it probably makes sense to just collect and then >> perform the offset as you are doing now. This is unless the offset is very >> large. >> >> Another limitation here is that HiveQL does not support OFFSET. That >> said if you want to open a JIRA we would consider implementing it. >> >> >> On Wed, Jul 2, 2014 at 1:37 PM, durin <m...@simon-schaefer.net> wrote: >> >>> Hi, >>> >>> in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause, >>> s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5. >>> >>> As far as I can see, this is not possible in Spark-SQL. >>> The best solution I have to imitate that (using Scala) is converting the >>> RDD >>> into an Array via collect() and then using a for-loop to return certain >>> elements from that Array. >>> >>> >>> >>> >>> Is there a better solution regarding performance and are there plans to >>> implement an offset for LIMIT? >>> >>> >>> Kind regards, >>> Simon >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >> >> >