OK, consider the case where there are multiple event triggers for a given customer/ vendor/product like 1,1,2,2,3 arranged in the order of *event* *occurrence* (time stamp). So output should be two groups (1,2) and (1,2,3). The doublet would be first occurrence of 1,2 and triplet later occurrences 1,2,3.
On 29 April 2015 at 18:04, Manoj Awasthi <awasthi.ma...@gmail.com> wrote: > Sorry but I didn't fully understand the grouping. This line: > > >> The group must only take the closest previous trigger. The first one > hence shows alone. > > Can you please explain further? > > > On Wed, Apr 29, 2015 at 4:42 PM, bipin <bipin....@gmail.com> wrote: > >> Hi, I have a ddf with schema (CustomerID, SupplierID, ProductID, Event, >> CreatedOn), the first 3 are Long ints and event can only be 1,2,3 and >> CreatedOn is a timestamp. How can I make a group triplet/doublet/singlet >> out >> of them such that I can infer that Customer registered event from 1to 2 >> and >> if present to 3 timewise and preserving the number of entries. For e.g. >> >> Before processing: >> 10001, 132, 2002, 1, 2012-11-23 >> 10001, 132, 2002, 1, 2012-11-24 >> 10031, 102, 223, 2, 2012-11-24 >> 10001, 132, 2002, 2, 2012-11-25 >> 10001, 132, 2002, 3, 2012-11-26 >> (total 5 rows) >> >> After processing: >> 10001, 132, 2002, 2012-11-23, "1" >> 10031, 102, 223, 2012-11-24, "2" >> 10001, 132, 2002, 2012-11-24, "1,2,3" >> (total 5 in last field - comma separated!) >> >> The group must only take the closest previous trigger. The first one hence >> shows alone. Can this be done using spark sql ? If it needs to processed >> in >> functionally in scala, how to do this. I can't wrap my head around this. >> Can >> anyone help. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-group-multiple-row-data-tp22701.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >