subject:"apache\-spark doesn't work correktly with russian alphabet"

Re: apache-spark doesn't work correktly with russian alphabet

2017-01-18 Thread Sergey B.

Try to make encoding right. E.g,, if you read from `csv` or other sources, specify encoding, which is most probably `cp1251`: df = sqlContext.read.csv(filePath, encoding="cp1251") On Linux cli encoding can be found with `chardet` utility On Wed, Jan 18, 2017 at 3:53 PM, AlexModestov wrote: >

apache-spark doesn't work correktly with russian alphabet

2017-01-18 Thread AlexModestov

I want to use Apache Spark for working with text data. There are some Russian symbols but Apache Spark shows me strings which look like as "...\u0413\u041e\u0420\u041e...". What should I do for correcting them. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com