Thanks! arun
On Wed, Apr 8, 2015 at 10:51 AM, java8964 <java8...@hotmail.com> wrote: > Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is > almost only supporting Linux, so UTF-8 is the only encoding supported, as > it is the the one on Linux. > > If you have other encoding data, you may want to vote for this Jira: > https://issues.apache.org/jira/browse/MAPREDUCE-232 > > Yong > > ------------------------------ > Date: Wed, 8 Apr 2015 10:35:18 -0700 > Subject: Reading file with Unicode characters > From: lists.a...@gmail.com > To: user@spark.apache.org > CC: lists.a...@gmail.com > > > Hi, > > Does SparkContext's textFile() method handle files with Unicode > characters? How about files in UTF-8 format? > > Going further, is it possible to specify encodings to the method? If not, > what should one do if the files to be read are in some encoding? > > Thanks, > arun > >