What I typically do is use row_number & subquery to filter based on that.
It works out pretty well, reduces the iteration. I think a offset solution
based on windowsing directly would be useful.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Fri, Jul 4, 2014 at 2:00 AM, Michael Armbrust <mich...@databricks.com>
wrote:

> Doing an offset is actually pretty expensive in a distributed query
> engine, so in many cases it probably makes sense to just collect and then
> perform the offset as you are doing now.  This is unless the offset is very
> large.
>
> Another limitation here is that HiveQL does not support OFFSET.  That said
> if you want to open a JIRA we would consider implementing it.
>
>
> On Wed, Jul 2, 2014 at 1:37 PM, durin <m...@simon-schaefer.net> wrote:
>
>> Hi,
>>
>> in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause,
>> s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5.
>>
>> As far as I can see, this is not possible in Spark-SQL.
>> The best solution I have to imitate that (using Scala) is converting the
>> RDD
>> into an Array via collect() and then using a for-loop to return certain
>> elements from that Array.
>>
>>
>>
>>
>> Is there a better solution regarding performance and are there plans to
>> implement an offset for LIMIT?
>>
>>
>> Kind regards,
>> Simon
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>

Reply via email to