Re: Spark-csv- partitionBy

Xinh Huynh Tue, 10 May 2016 08:39:06 -0700

Hi Pradeep,

Here is a way to partition your data into different files, by calling
repartition() on the dataframe:
df.repartition(12, $"Month")
  .write
  .format(...)


This is assuming you want to partition by a "month" column where there are
12 different values. Each partition will be stored in a separate file (but
in the same folder).

Xinh

On Tue, May 10, 2016 at 2:10 AM, Mail.com <pradeep.mi...@mail.com> wrote:

> Hi,
>
> I don't want to reduce partitions. Should write files depending upon the
> column value.
>
> Trying to understand how reducing partition size will make it work.
>
> Regards,
> Pradeep
>
> On May 9, 2016, at 6:42 PM, Gourav Sengupta <gourav.sengu...@gmail.com>
> wrote:
>
> Hi,
>
> its supported, try to use coalesce(1) (the spelling is wrong) and after
> that do the partitions.
>
> Regards,
> Gourav
>
> On Mon, May 9, 2016 at 7:12 PM, Mail.com <http://mail.com> <
> pradeep.mi...@mail.com> wrote:
>
>> Hi,
>>
>> I have to write tab delimited file and need to have one directory for
>> each unique value of a column.
>>
>> I tried using spark-csv with partitionBy and seems it is not supported.
>> Is there any other option available for doing this?
>>
>> Regards,
>> Pradeep
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Spark-csv- partitionBy

Reply via email to