Sorry but I didn't fully understand the grouping. This line:

>> The group must only take the closest previous trigger. The first one
hence shows alone.

Can you please explain further?


On Wed, Apr 29, 2015 at 4:42 PM, bipin <bipin....@gmail.com> wrote:

> Hi, I have a ddf with schema (CustomerID, SupplierID, ProductID, Event,
> CreatedOn), the first 3 are Long ints and event can only be 1,2,3 and
> CreatedOn is a timestamp. How can I make a group triplet/doublet/singlet
> out
> of them such that I can infer that Customer registered event from 1to 2 and
> if present to 3 timewise and preserving the number of entries. For e.g.
>
> Before processing:
> 10001, 132, 2002, 1, 2012-11-23
> 10001, 132, 2002, 1, 2012-11-24
> 10031, 102, 223, 2, 2012-11-24
> 10001, 132, 2002, 2, 2012-11-25
> 10001, 132, 2002, 3, 2012-11-26
> (total 5 rows)
>
> After processing:
> 10001, 132, 2002, 2012-11-23, "1"
> 10031, 102, 223, 2012-11-24, "2"
> 10001, 132, 2002, 2012-11-24, "1,2,3"
> (total 5 in last field - comma separated!)
>
> The group must only take the closest previous trigger. The first one hence
> shows alone. Can this be done using spark sql ? If it needs to processed in
> functionally in scala, how to do this. I can't wrap my head around this.
> Can
> anyone help.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-group-multiple-row-data-tp22701.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to