Thanks Ewan. I did the same way you explained. Thanks for your response
once again.
On Mon, May 9, 2016 at 4:21 PM, Ewan Leith
wrote:
> The simplest way is probably to use the sc.binaryFiles or
> sc.wholeTextFiles API to create an RDD containing the JSON files (maybe
> need a sc.wholeTextFiles(…
The simplest way is probably to use the sc.binaryFiles or sc.wholeTextFiles API
to create an RDD containing the JSON files (maybe need a
sc.wholeTextFiles(…).map(x => x._2) to drop off the filename column) then do a
sqlContext.read.json(rddName)
That way, you don’t need to worry about combining
This limit is due to underlying inputFormat implementation. you can always
write your own inputFormat and then use spark newAPIHadoopFile api to pass
your inputFormat class path. You will have to place the jar file in /lib
location on all the nodes..
Ashish
On Sun, May 8, 2016 at 4:02 PM, Hyukji
I remember this Jira, https://issues.apache.org/jira/browse/SPARK-7366.
Parsing multiple lines are not supported in Json fsta source.
Instead this can be done by sc.wholeTextFiles(). I found some examples
here,
http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files
Altho