You can also do this using a sequence of case classes (in the example
stored in a tuple, though the outer container could also be a case class):
case class MyRecord(name: String, location: String)
val df = Seq((1, Seq(MyRecord("Michael", "Berkeley"), MyRecord("Andy",
"Oakland")))).toDF("id", "people")
df.printSchema
root
|-- id: integer (nullable = false)
|-- people: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- name: string (nullable = true)
| | |-- location: string (nullable = true)
If this dataframe is saved to parquet the nesting will be preserved.
On Tue, Jun 23, 2015 at 4:35 PM, Roberto Congiu <[email protected]>
wrote:
> I wrote a brief howto on building nested records in spark and storing them
> in parquet here:
> http://www.congiu.com/creating-nested-data-parquet-in-spark-sql/
>
> 2015-06-23 16:12 GMT-07:00 Richard Catlin <[email protected]>:
>
>> How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a
>> column? Is there an example? Will this store as a nested parquet file?
>>
>> Thanks.
>>
>> Richard Catlin
>>
>
>