I'm not sure other than retrieving from a hive table that is already sorted. This sounds cool though, would be interested to know this as well
On Nov 28, 2017 10:40 AM, "Николай Ижиков" <nizhikov....@gmail.com> wrote: > Hello, guys! > > I work on implementation of custom DataSource for Spark Data Frame API and > have a question: > > If I have a `SELECT * FROM table1 ORDER BY some_column` query I can sort > data inside a partition in my data source. > > Do I have a built-in option to tell spark that data from each partition > already sorted? > > It seems that Spark can benefit from usage of already sorted partitions. > By using of distributed merge sort algorithm, for example. > > Does it make sense for you? > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >