Re: dataframes and numPartitions

Jorge Sánchez Sun, 18 Oct 2015 12:47:14 -0700

Alex,

If not, you can try using the functions coalesce(n) or repartition(n).


As per the API, coalesce will not make a shuffle but repartition will.

Regards.

2015-10-16 0:52 GMT+01:00 Mohammed Guller <moham...@glassbeam.com>:

> You may find the spark.sql.shuffle.partitions property useful. The default
> value is 200.
>
>
>
> Mohammed
>
>
>
> *From:* Alex Nastetsky [mailto:alex.nastet...@vervemobile.com]
> *Sent:* Wednesday, October 14, 2015 8:14 PM
> *To:* user
> *Subject:* dataframes and numPartitions
>
>
>
> A lot of RDD methods take a numPartitions parameter that lets you specify
> the number of partitions in the result. For example, groupByKey.
>
>
>
> The DataFrame counterparts don't have a numPartitions parameter, e.g.
> groupBy only takes a bunch of Columns as params.
>
>
>
> I understand that the DataFrame API is supposed to be smarter and go
> through a LogicalPlan, and perhaps determine the number of optimal
> partitions for you, but sometimes you want to specify the number of
> partitions yourself. One such use case is when you are preparing to do a
> "merge" join with another dataset that is similarly partitioned with the
> same number of partitions.
>

Re: dataframes and numPartitions

Reply via email to