Michael, I have two Dataframes. A "users" DF, and an "investments" DF. The "investments" DF has a column that matches the "users" id. I would like to nest the collection of investments for each user and save to a parquet file.
Is there a straightforward way to do this? Thanks. Richard Catlin On Tue, Jun 23, 2015 at 4:57 PM, Michael Armbrust <mich...@databricks.com> wrote: > You can also do this using a sequence of case classes (in the example > stored in a tuple, though the outer container could also be a case class): > > case class MyRecord(name: String, location: String) > val df = Seq((1, Seq(MyRecord("Michael", "Berkeley"), MyRecord("Andy", > "Oakland")))).toDF("id", "people") > > df.printSchema > > root > |-- id: integer (nullable = false) > |-- people: array (nullable = true) > | |-- element: struct (containsNull = true) > | | |-- name: string (nullable = true) > | | |-- location: string (nullable = true) > > If this dataframe is saved to parquet the nesting will be preserved. > > On Tue, Jun 23, 2015 at 4:35 PM, Roberto Congiu <roberto.con...@gmail.com> > wrote: > >> I wrote a brief howto on building nested records in spark and storing >> them in parquet here: >> http://www.congiu.com/creating-nested-data-parquet-in-spark-sql/ >> >> 2015-06-23 16:12 GMT-07:00 Richard Catlin <richard.m.cat...@gmail.com>: >> >>> How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a >>> column? Is there an example? Will this store as a nested parquet file? >>> >>> Thanks. >>> >>> Richard Catlin >>> >> >> >