Hi, I'm curious why it's common for data to be repartitioned to 1 partition when saving ml models:
sqlContext.createDataFrame(Seq(data)).repartition(1).write.parquet(dataPath) This shows up in most ml models I've seen (Word2Vec <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala#L314>, PCA <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala#L189>, LDA <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala#L605>). Am I missing some benefit of repartitioning like this? Thanks, -- Asher Krim Senior Software Engineer