Thank you Armbust. On Fri, Mar 24, 2017 at 7:02 PM, Michael Armbrust <mich...@databricks.com> wrote:
> I'm not sure you can parse this as an Array, but you can hint to the > parser that you would like to treat source as a map instead of as a > struct. This is a good strategy when you have dynamic columns in your data. > > Here is an example of the schema you can use to parse this JSON and also > how to use explode to turn it into separate rows > <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/679071429109042/2840265927289860/latest.html>. > This blog post has more on working with semi-structured data in Spark > <https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html> > . > > On Thu, Mar 23, 2017 at 2:49 PM, Yong Zhang <java8...@hotmail.com> wrote: > >> That's why your "source" should be defined as an Array[Struct] type >> (which makes sense in this case, it has an undetermined length , so you >> can explode it and get the description easily. >> >> Now you need write your own UDF, maybe can do what you want. >> >> Yong >> >> ------------------------------ >> *From:* Selvam Raman <sel...@gmail.com> >> *Sent:* Thursday, March 23, 2017 5:03 PM >> *To:* user >> *Subject:* how to read object field within json file >> >> Hi, >> >> { >> "id": "test1", >> "source": { >> "F1": { >> "id": "4970", >> "eId": "F1", >> "description": "test1", >> }, >> "F2": { >> "id": "5070", >> "eId": "F2", >> "description": "test2", >> }, >> "F3": { >> "id": "5170", >> "eId": "F3", >> "description": "test3", >> }, >> "F4":{} >> etc.. >> "F999":{} >> } >> >> I am having bzip json files like above format. >> some json row contains two objects within source(like F1 and F2), >> sometime five(F1,F2,F3,F4,F5),etc. So the final schema will contains >> combination of all objects for the source field. >> >> Now, every row will contain n number of objects but only some contains >> valid records. >> how can i retreive the value of "description" in "source" field. >> >> source.F1.description - returns the result but how can i get all >> description result for every row..(something like this >> "source.*.description"). >> >> -- >> Selvam Raman >> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" >> > > -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"