Re: Reading file with Unicode characters

Arun Lists Wed, 08 Apr 2015 15:54:07 -0700

Thanks!

arun


On Wed, Apr 8, 2015 at 10:51 AM, java8964 <java8...@hotmail.com> wrote:

> Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is
> almost only supporting Linux, so UTF-8 is the only encoding supported, as
> it is the the one on Linux.
>
> If you have other encoding data, you may want to vote for this Jira:
> https://issues.apache.org/jira/browse/MAPREDUCE-232
>
> Yong
>
> ------------------------------
> Date: Wed, 8 Apr 2015 10:35:18 -0700
> Subject: Reading file with Unicode characters
> From: lists.a...@gmail.com
> To: user@spark.apache.org
> CC: lists.a...@gmail.com
>
>
> Hi,
>
> Does SparkContext's textFile() method handle files with Unicode
> characters? How about files in UTF-8 format?
>
> Going further, is it possible to specify encodings to the method? If not,
> what should one do if the files to be read are in some encoding?
>
> Thanks,
> arun
>
>

Re: Reading file with Unicode characters

Reply via email to