Re: LIMIT issue of SparkSQL

Liz Bai Sun, 23 Oct 2016 20:40:52 -0700

Hi all,

Let me clarify the problem:


Suppose we have a simple table `A` with 100 000 000 records

Problem:
When we execute sql query ‘select * from A Limit 500`,
It scan through all 100 000 000 records. 
Normal behaviour should be that once 500 records is found, engine stop scanning.

Detailed observation:
We found that there are “GlobalLimit / LocalLimit” physical operators
https://github.com/apache/spark/blob/branch-2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala
 
<https://github.com/apache/spark/blob/branch-2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala>
But during query plan generation, GlobalLimit / LocalLimit is not applied to 
the query plan.

Could you please help us to inspect LIMIT problem? 
Thanks.

Best,
Liz
> On 23 Oct 2016, at 10:11 PM, Xiao Li <gatorsm...@gmail.com> wrote:
> 
> Hi, Liz,
> 
> CollectLimit means `Take the first `limit` elements and collect them to a 
> single partition.`
> 
> Thanks,
> 
> Xiao 
> 
> 2016-10-23 5:21 GMT-07:00 Ran Bai <liz...@icloud.com 
> <mailto:liz...@icloud.com>>:
> Hi all,
> 
> I found the runtime for query with or without “LIMIT” keyword is the same. We 
> looked into it and found actually there is “GlobalLimit / LocalLimit” in 
> logical plan, however no relevant physical plan there. Is this a bug or 
> something else? Attached are the logical and physical plans when running 
> "SELECT * FROM seq LIMIT 1".
> 
> 
> More specifically, We expected a early stop upon getting adequate results.
> Thanks so much.
> 
> Best,
> Liz
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> <mailto:dev-unsubscr...@spark.apache.org>
>

Re: LIMIT issue of SparkSQL

Reply via email to