Re: Nested DataFrame(SchemaRDD)

Michael Armbrust Tue, 23 Jun 2015 16:58:34 -0700

You can also do this using a sequence of case classes (in the example
stored in a tuple, though the outer container could also be a case class):


case class MyRecord(name: String, location: String)
val df = Seq((1, Seq(MyRecord("Michael", "Berkeley"), MyRecord("Andy",
"Oakland")))).toDF("id", "people")

df.printSchema

root
|-- id: integer (nullable = false)
|-- people: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- name: string (nullable = true)
| | |-- location: string (nullable = true)

If this dataframe is saved to parquet the nesting will be preserved.

On Tue, Jun 23, 2015 at 4:35 PM, Roberto Congiu <roberto.con...@gmail.com>
wrote:

> I wrote a brief howto on building nested records in spark and storing them
> in parquet here:
> http://www.congiu.com/creating-nested-data-parquet-in-spark-sql/
>
> 2015-06-23 16:12 GMT-07:00 Richard Catlin <richard.m.cat...@gmail.com>:
>
>> How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a
>> column?  Is there an example?  Will this store as a nested parquet file?
>>
>> Thanks.
>>
>> Richard Catlin
>>
>
>

Re: Nested DataFrame(SchemaRDD)

Reply via email to