You can groupBy multiple columns on dataframe, so why you need so complicated schema ?
suppose df schema: (x, y, u, v, z) df.groupBy($"x", $"y").agg(...) Is this you want ? On Fri, Dec 8, 2017 at 11:51 AM, Sandip Mehta <sandip.mehta....@gmail.com> wrote: > Hi, > > During my aggregation I end up having following schema. > > Row(Row(val1,val2), Row(val1,val2,val3...)) > > val values = Seq( > (Row(10, 11), Row(10, 2, 11)), > (Row(10, 11), Row(10, 2, 11)), > (Row(20, 11), Row(10, 2, 11)) > ) > > > 1st tuple is used to group the relevant records for aggregation. I have > used following to create dataset. > > val s = StructType(Seq( > StructField("x", IntegerType, true), > StructField("y", IntegerType, true) > )) > val s1 = StructType(Seq( > StructField("u", IntegerType, true), > StructField("v", IntegerType, true), > StructField("z", IntegerType, true) > )) > > val ds = > sparkSession.sqlContext.createDataset(sparkSession.sparkContext.parallelize(values))(Encoders.tuple(RowEncoder(s), > RowEncoder(s1))) > > Is this correct way of representing this? > > How do I create dataset and row encoder for such use case for doing > groupByKey on this? > > > > Regards > Sandeep >