Try to make encoding right.
E.g,, if you read from `csv` or other sources, specify encoding, which is
most probably `cp1251`:
df = sqlContext.read.csv(filePath, encoding="cp1251")
On Linux cli encoding can be found with `chardet` utility
On Wed, Jan 18, 2017 at 3:53 PM, AlexModestov
wrote:
>
I want to use Apache Spark for working with text data. There are some Russian
symbols but Apache Spark shows me strings which look like as
"...\u0413\u041e\u0420\u041e...". What should I do for correcting them.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com