Though I'll note that window functions are not yet supported in Spark SQL.
https://issues.apache.org/jira/browse/SPARK-1442


On Fri, Jul 4, 2014 at 6:59 AM, Mayur Rustagi <mayur.rust...@gmail.com>
wrote:

> What I typically do is use row_number & subquery to filter based on that.
> It works out pretty well, reduces the iteration. I think a offset solution
> based on windowsing directly would be useful.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Fri, Jul 4, 2014 at 2:00 AM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> Doing an offset is actually pretty expensive in a distributed query
>> engine, so in many cases it probably makes sense to just collect and then
>> perform the offset as you are doing now.  This is unless the offset is very
>> large.
>>
>> Another limitation here is that HiveQL does not support OFFSET.  That
>> said if you want to open a JIRA we would consider implementing it.
>>
>>
>> On Wed, Jul 2, 2014 at 1:37 PM, durin <m...@simon-schaefer.net> wrote:
>>
>>> Hi,
>>>
>>> in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause,
>>> s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5.
>>>
>>> As far as I can see, this is not possible in Spark-SQL.
>>> The best solution I have to imitate that (using Scala) is converting the
>>> RDD
>>> into an Array via collect() and then using a for-loop to return certain
>>> elements from that Array.
>>>
>>>
>>>
>>>
>>> Is there a better solution regarding performance and are there plans to
>>> implement an offset for LIMIT?
>>>
>>>
>>> Kind regards,
>>> Simon
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>
>>
>

Reply via email to