Alex, If not, you can try using the functions coalesce(n) or repartition(n).
As per the API, coalesce will not make a shuffle but repartition will. Regards. 2015-10-16 0:52 GMT+01:00 Mohammed Guller <moham...@glassbeam.com>: > You may find the spark.sql.shuffle.partitions property useful. The default > value is 200. > > > > Mohammed > > > > *From:* Alex Nastetsky [mailto:alex.nastet...@vervemobile.com] > *Sent:* Wednesday, October 14, 2015 8:14 PM > *To:* user > *Subject:* dataframes and numPartitions > > > > A lot of RDD methods take a numPartitions parameter that lets you specify > the number of partitions in the result. For example, groupByKey. > > > > The DataFrame counterparts don't have a numPartitions parameter, e.g. > groupBy only takes a bunch of Columns as params. > > > > I understand that the DataFrame API is supposed to be smarter and go > through a LogicalPlan, and perhaps determine the number of optimal > partitions for you, but sometimes you want to specify the number of > partitions yourself. One such use case is when you are preparing to do a > "merge" join with another dataset that is similarly partitioned with the > same number of partitions. >