Something along the line of:

Dataset<Row> df = spark.read().json(jsonDf); ?


From: kant kodali [mailto:kanth...@gmail.com]
Sent: Saturday, October 07, 2017 2:31 AM
To: user @spark <user@spark.apache.org>
Subject: How to convert Array of Json rows into Dataset of specific columns in 
Spark 2.2.0?


I have a Dataset<String> ds which consists of json rows.

Sample Json Row (This is just an example of one row in the dataset)

[

    {"name": "foo", "address": {"state": "CA", "country": "USA"}, 
"docs":[{"subject": "english", "year": 2016}]}

    {"name": "bar", "address": {"state": "OH", "country": "USA"}, 
"docs":[{"subject": "math", "year": 2017}]}



]

ds.printSchema()

root

 |-- value: string (nullable = true)

Now I want to convert into the following dataset using Spark 2.2.0

name  |             address               |  docs

----------------------------------------------------------------------------------

"foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 
2016}]

"bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 2017}]

Preferably Java but Scala is also fine as long as there are functions available 
in Java API

Reply via email to