Re: Question regarding structured data and partitions

2016-07-07 Thread tan shai
Thank you for your answer. Since Spark 1.6.0, it is possible to partition a dataframe using hash partitioning with Repartition " https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame " I have also sorted a dataframe and it using a range partitioning in the physic

Re: Question regarding structured data and partitions

2016-07-07 Thread Koert Kuipers
since dataframes represent more or less a plan of execution, they do not have partition information as such i think? you could however do dataFrame.rdd, to force it to create a physical plan that results in an actual rdd, and then query the rdd for partition info. On Thu, Jul 7, 2016 at 4:24 AM, t

Re: Question regarding structured data and partitions

2016-07-07 Thread tan shai
Using partitioning with dataframes, how can we retrieve informations about partitions? partitions bounds for example Thanks, Shaira 2016-07-07 6:30 GMT+02:00 Koert Kuipers : > spark does keep some information on the partitions of an RDD, namely the > partitioning/partitioner. > > GroupSorted is

Re: Question regarding structured data and partitions

2016-07-06 Thread Koert Kuipers
spark does keep some information on the partitions of an RDD, namely the partitioning/partitioner. GroupSorted is an extension for key-value RDDs that also keeps track of the ordering, allowing for faster joins, non-reduce type operations on very large groups of values per key, etc. see here: http