Spark Data Frame. PreSorded partitions

Николай Ижиков Sun, 03 Dec 2017 22:51:07 -0800

Cross-posting from @user.

Hello, guys!


I work on implementation of custom DataSource for Spark Data Frame API and have 
a question:

If I have a `SELECT * FROM table1 ORDER BY some_column` query I can sort data 
inside a partition in my data source.

Do I have a built-in option to tell spark that data from each partition already 
sorted?

It seems that Spark can benefit from usage of already sorted partitions.
By using of distributed merge sort algorithm, for example.

Does it make sense for you?


28.11.2017 18:42, Michael Artz пишет:

I'm not sure other than retrieving from a hive table that is already sorted.  
This sounds cool though, would be interested to know this as well

On Nov 28, 2017 10:40 AM, "Николай Ижиков" <nizhikov....@gmail.com 
<mailto:nizhikov....@gmail.com>> wrote:

    Hello, guys!

    I work on implementation of custom DataSource for Spark Data Frame API and 
have a question:

    If I have a `SELECT * FROM table1 ORDER BY some_column` query I can sort 
data inside a partition in my data source.

    Do I have a built-in option to tell spark that data from each partition 
already sorted?

    It seems that Spark can benefit from usage of already sorted partitions.
    By using of distributed merge sort algorithm, for example.

    Does it make sense for you?

    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
<mailto:user-unsubscr...@spark.apache.org>


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Spark Data Frame. PreSorded partitions

Reply via email to