Something along the line of: Dataset<Row> df = spark.read().json(jsonDf); ?
From: kant kodali [mailto:kanth...@gmail.com] Sent: Saturday, October 07, 2017 2:31 AM To: user @spark <user@spark.apache.org> Subject: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0? I have a Dataset<String> ds which consists of json rows. Sample Json Row (This is just an example of one row in the dataset) [ {"name": "foo", "address": {"state": "CA", "country": "USA"}, "docs":[{"subject": "english", "year": 2016}]} {"name": "bar", "address": {"state": "OH", "country": "USA"}, "docs":[{"subject": "math", "year": 2017}]} ] ds.printSchema() root |-- value: string (nullable = true) Now I want to convert into the following dataset using Spark 2.2.0 name | address | docs ---------------------------------------------------------------------------------- "foo" | {"state": "CA", "country": "USA"} | [{"subject": "english", "year": 2016}] "bar" | {"state": "OH", "country": "USA"} | [{"subject": "math", "year": 2017}] Preferably Java but Scala is also fine as long as there are functions available in Java API