I think you'd probably want to look at combineByKey. I'm on my phone so can't give you an example, but that's one solution i would try.
You would then take the resulting RDD and go back to a DF if needed. ________________________________ From: bipin<mailto:bipin....@gmail.com> Sent: 4/29/2015 4:13 AM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: How to group multiple row data ? Hi, I have a ddf with schema (CustomerID, SupplierID, ProductID, Event, CreatedOn), the first 3 are Long ints and event can only be 1,2,3 and CreatedOn is a timestamp. How can I make a group triplet/doublet/singlet out of them such that I can infer that Customer registered event from 1to 2 and if present to 3 timewise and preserving the number of entries. For e.g. Before processing: 10001, 132, 2002, 1, 2012-11-23 10001, 132, 2002, 1, 2012-11-24 10031, 102, 223, 2, 2012-11-24 10001, 132, 2002, 2, 2012-11-25 10001, 132, 2002, 3, 2012-11-26 (total 5 rows) After processing: 10001, 132, 2002, 2012-11-23, "1" 10031, 102, 223, 2012-11-24, "2" 10001, 132, 2002, 2012-11-24, "1,2,3" (total 5 in last field - comma separated!) The group must only take the closest previous trigger. The first one hence shows alone. Can this be done using spark sql ? If it needs to processed in functionally in scala, how to do this. I can't wrap my head around this. Can anyone help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-group-multiple-row-data-tp22701.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org