Hi Lian, Since you using repartition(1), do you want to decrease the number of partitions? If so, have you tried to use coalesce instead?
Kathleen On Fri, Mar 22, 2019 at 2:43 PM Lian Jiang <jiangok2...@gmail.com> wrote: > Hi, > > Writing a csv to HDFS takes about 1 hour: > > > df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').options(header='true').save(csv) > > The generated csv file is only about 150kb. The job uses 3 containers (13 > cores, 23g mem). > > Other people have similar issues but I don't see a good explanation and > solution. > > Any clue is highly appreciated! Thanks. > > >