Doing an offset is actually pretty expensive in a distributed query engine, so in many cases it probably makes sense to just collect and then perform the offset as you are doing now. This is unless the offset is very large.
Another limitation here is that HiveQL does not support OFFSET. That said if you want to open a JIRA we would consider implementing it. On Wed, Jul 2, 2014 at 1:37 PM, durin <m...@simon-schaefer.net> wrote: > Hi, > > in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause, > s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5. > > As far as I can see, this is not possible in Spark-SQL. > The best solution I have to imitate that (using Scala) is converting the > RDD > into an Array via collect() and then using a for-loop to return certain > elements from that Array. > > > > > Is there a better solution regarding performance and are there plans to > implement an offset for LIMIT? > > > Kind regards, > Simon > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >