Re: Hash Partitioning and Dataframes

2015-07-17 Thread Stephen Boesch
al data. >> >> >> >> Thanks for the info. >> >> >> >> Ron >> >> >> >> >> >> *From:* Michael Armbrust [mailto:mich...@databricks.com] >> *Sent:* Friday, May 08, 2015 3:15 PM >> *To:* Daniel, Ronald (ELS-SDG) >&g

Re: Hash Partitioning and Dataframes

2015-05-09 Thread Michael Armbrust
d (ELS-SDG) > *Cc:* user@spark.apache.org > *Subject:* Re: Hash Partitioning and Dataframes > > > > What are you trying to accomplish? Internally Spark SQL will add Exchange > operators to make sure that data is partitioned correctly for joins and > aggregations. If you are g

RE: Hash Partitioning and Dataframes

2015-05-08 Thread Daniel, Ronald (ELS-SDG)
, May 08, 2015 3:15 PM To: Daniel, Ronald (ELS-SDG) Cc: user@spark.apache.org Subject: Re: Hash Partitioning and Dataframes What are you trying to accomplish? Internally Spark SQL will add Exchange operators to make sure that data is partitioned correctly for joins and aggregations. If you are

Re: Hash Partitioning and Dataframes

2015-05-08 Thread Michael Armbrust
What are you trying to accomplish? Internally Spark SQL will add Exchange operators to make sure that data is partitioned correctly for joins and aggregations. If you are going to do other RDD operations on the result of dataframe operations and you need to manually control the partitioning, call

Hash Partitioning and Dataframes

2015-05-08 Thread Daniel, Ronald (ELS-SDG)
Hi, How can I ensure that a batch of DataFrames I make are all partitioned based on the value of one column common to them all? For RDDs I would partitionBy a HashPartitioner, but I don't see that in the DataFrame API. If I partition the RDDs that way, then do a toDF(), will the partitioning be