I encountered a similar problem reading multi-line JSON files into Spark a
while back, and here's an article I wrote about how to solve it:
http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files/
You may find it useful.
Femi
On Thu, Mar 31, 2016 at 12:32 PM, wrote:
>
You are correct that it does not take the standard JSON file format. From the
Spark Docs:
"Note that the file that is offered as a json file is not a typical JSON file.
Each line must contain a separate, self-contained valid JSON object. As a
consequence, a regular multi-line JSON file will most
Hi Charles,
The definition of object from www.json.org:
An *object* is an unordered set of name/value pairs. An object begins with {
(left brace) and ends with } (right brace). Each name is followed by :
(colon) and the name/value pairs are separated by , (comma).
Its a pretty much OOPS paradigm
hi, UMESH, I think you've misunderstood the json definition.
there is only one object in a json file:
for the file, people.json, as bellow:
{"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}
Hello,
Actually I have been through the same problem as you when I was
implementing a decision tree algorithm with Spark parsing the output to a
comprehensible json format.
So as you said; the correct json format is :
[{
"name": "Yin",
"address": {
"city": "Columbus",
"sta
Hi,
Look at below image which is from json.org :
[image: Inline image 1]
The above image describes the object formulation of below JSON:
Object 1=> {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}
Object=> {"name":"Michael", "address":{"city":null, "state":"California"}}
Note that