Hi Lian,
Since you using repartition(1), do you want to decrease the number of
partitions? If so, have you tried to use coalesce instead?

Kathleen

On Fri, Mar 22, 2019 at 2:43 PM Lian Jiang <jiangok2...@gmail.com> wrote:

> Hi,
>
> Writing a csv to HDFS takes about 1 hour:
>
>
> df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').options(header='true').save(csv)
>
> The generated csv file is only about 150kb. The job uses 3 containers (13
> cores, 23g mem).
>
> Other people have similar issues but I don't see a good explanation and
> solution.
>
> Any clue is highly appreciated! Thanks.
>
>
>

Reply via email to