I just followed Hien Luu approach

val empExplode = empInfoStrDF.select(explode(from_json('emp_info_str,
  empInfoSchema)).as("emp_info_withexplode"))


empExplode.show(false)

+-------------------------------------------+
|emp_info_withexplode                       |
+-------------------------------------------+
|[foo,[CA,USA],WrappedArray([english,2016])]|
|[bar,[OH,USA],WrappedArray([math,2017])]   |
+-------------------------------------------+

empExplode.select($"emp_info_withexplode.name").show(false)


+----+
|name|
+----+
|foo |
|bar |
+----+

empExplode.select($"emp_info_withexplode.address.state").show(false)

+-----+
|state|
+-----+
|CA   |
|OH   |
+-----+

empExplode.select($"emp_info_withexplode.docs.subject").show(false)

+---------+
|subject  |
+---------+
|[english]|
|[math]   |
+---------+


@Kant kodali, is that helpful for you? if not please let me know what
changes are you expecting in this?




On Sun, Jan 7, 2018 at 12:16 AM, Jules Damji <[email protected]> wrote:

> Here’s are couple tutorial that shows how to extract Structured nested
> data
>
> https://databricks.com/blog/2017/06/27/4-sql-high-order-
> lambda-functions-examine-complex-structured-data-databricks.html
>
> https://databricks.com/blog/2017/06/13/five-spark-sql-
> utility-functions-extract-explore-complex-data-types.html
>
> Sent from my iPhone
> Pardon the dumb thumb typos :)
>
> On Jan 6, 2018, at 11:42 AM, Hien Luu <[email protected]> wrote:
>
> Hi Kant,
>
> I am not sure whether you had come up with a solution yet, but the
> following
> works for me (in Scala)
>
> val emp_info = """
>  [
>    {"name": "foo", "address": {"state": "CA", "country": "USA"},
> "docs":[{"subject": "english", "year": 2016}]},
>    {"name": "bar", "address": {"state": "OH", "country": "USA"},
> "docs":[{"subject": "math", "year": 2017}]}
>  ]"""
>
> import org.apache.spark.sql.types._
>
> val addressSchema = new StructType().add("state",
> StringType).add("country",
> StringType)
> val docsSchema = ArrayType(new StructType().add("subject",
> StringType).add("year", IntegerType))
> val employeeSchema = new StructType().add("name",
> StringType).add("address",
> addressSchema).add("docs", docsSchema)
>
> val empInfoSchema = ArrayType(employeeSchema)
>
> empInfoSchema.json
>
> val empInfoStrDF = Seq((emp_info)).toDF("emp_info_str")
> empInfoStrDF.printSchema
> empInfoStrDF.show(false)
>
> val empInfoDF = empInfoStrDF.select(from_json('emp_info_str,
> empInfoSchema).as("emp_info"))
> empInfoDF.printSchema
>
> empInfoDF.select(struct("*")).show(false)
>
> empInfoDF.select("emp_info.name", "emp_info.address",
> "emp_info.docs").show(false)
>
> empInfoDF.select(explode('emp_info.getItem("name"))).show
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>


-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Reply via email to