Attached you will find a project with unit tests showing the issue at hand.
If I read in a ISO-8859-1 encoded file and simply write out what was read; the contents in the part file matches what was read. Which is great. However, the second I use a map / mapPartitions function it looks like the encoding is not correct. In addition a simple collectAsList and writing that list of strings to a file does not work either. I don't think I'm doing anything wrong. Can someone please investigate? I think this is a bug. spark-sandbox.zip <http://apache-spark-user-list.1001560.n3.nabble.com/file/t7751/spark-sandbox.zip> -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org