Re: writing a small csv to HDFS is super slow

kathy Harayama Fri, 22 Mar 2019 16:09:20 -0700

Hi Lian,
Since you using repartition(1), do you want to decrease the number of
partitions? If so, have you tried to use coalesce instead?


Kathleen

On Fri, Mar 22, 2019 at 2:43 PM Lian Jiang <jiangok2...@gmail.com> wrote:

> Hi,
>
> Writing a csv to HDFS takes about 1 hour:
>
>
> df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').options(header='true').save(csv)
>
> The generated csv file is only about 150kb. The job uses 3 containers (13
> cores, 23g mem).
>
> Other people have similar issues but I don't see a good explanation and
> solution.
>
> Any clue is highly appreciated! Thanks.
>
>
>

Re: writing a small csv to HDFS is super slow

Reply via email to