Re: Analyzing consecutive elements

Adrian Tanase Thu, 22 Oct 2015 06:49:31 -0700

Drop is a method on scala’s collections (array, list, etc) - not on Spark’s 
RDDs. You can look at it as long as you use mapPartitions or something like 
reduceByKey, but it totally depends on the use-cases you have for analytics.

The others have suggested better solutions using only spark’s APIs.

-adrian

From: Sampo Niskanen
Date: Thursday, October 22, 2015 at 2:12 PM
To: Adrian Tanase
Cc: user
Subject: Re: Analyzing consecutive elements

Hi,

Sorry, I'm not very familiar with those methods and cannot find the 'drop' 
method anywhere.

As an example:

val arr = Array((1, "A"), (8, "D"), (7, "C"), (3, "B"), (9, "E"))
val rdd = sc.parallelize(arr)
val sorted = rdd.sortByKey(true)
// ... then what?

Thanks.

Best regards,

    Sampo Niskanen
    Lead developer / Wellmo

    [email protected]<mailto:[email protected]>
    +358 40 820 5291

On Thu, Oct 22, 2015 at 10:43 AM, Adrian Tanase 
<[email protected]<mailto:[email protected]>> wrote:
I'm not sure if there is a better way to do it directly using Spark APIs but I 
would try to use mapPartitions and then within each partition Iterable to:

rdd.zip(rdd.drop(1)) - using the Scala collection APIs

This should give you what you need inside a partition. I'm hoping that you can 
partition your data somehow (e.g by user id or session id) that makes you 
algorithm parallel. In that case you can use the snippet above in a reduceByKey.

hope this helps
-adrian

Sent from my iPhone

On 22 Oct 2015, at 09:36, Sampo Niskanen 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

I have analytics data with timestamps on each element.  I'd like to analyze 
consecutive elements using Spark, but haven't figured out how to do this.

Essentially what I'd want is a transform from a sorted RDD [A, B, C, D, E] to 
an RDD [(A,B), (B,C), (C,D), (D,E)].  (Or some other way to analyze 
time-related elements.)

How can this be achieved?

    Sampo Niskanen
    Lead developer / Wellmo

    [email protected]<mailto:[email protected]>
    +358 40 820 5291

Re: Analyzing consecutive elements

Reply via email to