Re: Nested DataFrame(SchemaRDD)

Richard Catlin Wed, 24 Jun 2015 16:36:06 -0700

Michael,

I have two Dataframes.  A "users" DF, and an "investments" DF.  The
"investments" DF has a column that matches the "users" id.  I would like to
nest the collection of investments for each user and save to a parquet file.


Is there a straightforward way to do this?

Thanks.
Richard Catlin

On Tue, Jun 23, 2015 at 4:57 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> You can also do this using a sequence of case classes (in the example
> stored in a tuple, though the outer container could also be a case class):
>
> case class MyRecord(name: String, location: String)
> val df = Seq((1, Seq(MyRecord("Michael", "Berkeley"), MyRecord("Andy",
> "Oakland")))).toDF("id", "people")
>
> df.printSchema
>
> root
> |-- id: integer (nullable = false)
> |-- people: array (nullable = true)
> | |-- element: struct (containsNull = true)
> | | |-- name: string (nullable = true)
> | | |-- location: string (nullable = true)
>
> If this dataframe is saved to parquet the nesting will be preserved.
>
> On Tue, Jun 23, 2015 at 4:35 PM, Roberto Congiu <roberto.con...@gmail.com>
> wrote:
>
>> I wrote a brief howto on building nested records in spark and storing
>> them in parquet here:
>> http://www.congiu.com/creating-nested-data-parquet-in-spark-sql/
>>
>> 2015-06-23 16:12 GMT-07:00 Richard Catlin <richard.m.cat...@gmail.com>:
>>
>>> How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a
>>> column?  Is there an example?  Will this store as a nested parquet file?
>>>
>>> Thanks.
>>>
>>> Richard Catlin
>>>
>>
>>
>

Re: Nested DataFrame(SchemaRDD)

Reply via email to