Re: RDD saveAsText and DataFrame write.mode(SaveMode).text(Path) duplicating rows

2017-06-09 Thread Barona, Ricardo
: "Manjunath, Kiran" Date: Friday, June 9, 2017 at 1:47 PM To: "Barona, Ricardo" , "user@spark.apache.org" Subject: Re: RDD saveAsText and DataFrame write.mode(SaveMode).text(Path) duplicating rows Can you post your code and sample input? That should help us understand

Re: RDD saveAsText and DataFrame write.mode(SaveMode).text(Path) duplicating rows

2017-06-09 Thread Manjunath, Kiran
Can you post your code and sample input? That should help us understand if there is a bug in the code written or with the platform. Regards, Kiran From: "Barona, Ricardo" Date: Friday, June 9, 2017 at 10:47 PM To: "user@spark.apache.org" Subject: RDD saveAsText and

RDD saveAsText and DataFrame write.mode(SaveMode).text(Path) duplicating rows

2017-06-09 Thread Barona, Ricardo
In Spark 1.6.0 I’m having an issue with saveAsText and write.mode.text where I have a data frame with 1M+ rows and then I do: dataFrame.limit(500).map(_.mkString(“\t”)).toDF(“row”).write.mode(SaveMode.Overwrite).text(“myHDFSFolder/results”) then when I check for the results file, I see 900+ rows