My Terminal can display UTF-8 encoded characters. I already verified that.
But will double check again.
Thanks!
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@s
Is the original file indeed utf-8? Especially Windows environments tend to mess
up the files (E.g. Java on Windows does not use by default UTF-8). However,
also the software that processed the data before could have modified it.
> Am 10.11.2018 um 02:17 schrieb lsn24 :
>
> Hello,
>
> Per the d
That doesn't necessarily look like a Spark-related issue. Your
terminal seems to be displaying the glyph with a question mark because
the font lacks that symbol, maybe?
On Fri, Nov 9, 2018 at 7:17 PM lsn24 wrote:
>
> Hello,
>
> Per the documentation default character encoding of spark is UTF-8. B
Hello,
Per the documentation default character encoding of spark is UTF-8. But
when i try to read non ascii characters, spark tend to read it as question
marks. What am I doing wrong ?. Below is my Syntax:
val ds = spark.read.textFile("a .bz2 file from hdfs");
ds.show();
The string "KøBENHAVN"