ew this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Help-with-updateStateByKey-tp22637.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscrib
e returns None.
>>
>> Instead, try:
>>
>> Some(currentValue.getOrElse(Seq.empty) ++ newValues)
>>
>> I think that should give you the expected result.
>>
>>
>> From: Pierce Lamb
>> Date: Thursday, December 18, 2014 at 2:31 PM
>>
returns None.
>
> Instead, try:
>
> Some(currentValue.getOrElse(Seq.empty) ++ newValues)
>
> I think that should give you the expected result.
>
>
> From: Pierce Lamb
> Date: Thursday, December 18, 2014 at 2:31 PM
> To: Silvio Fiorito
> Cc: "user@spark.a
expected result.
From: Pierce Lamb
mailto:richard.pierce.l...@gmail.com>>
Date: Thursday, December 18, 2014 at 2:31 PM
To: Silvio Fiorito
mailto:silvio.fior...@granturing.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
mailto:user@spark.apache.
Hi Silvio,
This is a great suggestion (I wanted to get rid of groupByKey), I have been
trying to implement it this morning, but having some trouble. I would love
to see your code for the function that goes inside updateStateByKey
Here is my current code:
def updateGroupByKey( newValues: Seq[(St
Hi Pierce,
You shouldn’t have to use groupByKey because updateStateByKey will get a
Seq of all the values for that key already.
I used that for realtime sessionization as well. What I did was key my
incoming events, then send them to udpateStateByKey. The updateStateByKey
function then receive
Another point to start playing with updateStateByKey is the example
StatefulNetworkWordCount. See the streaming examples directory in the
Spark repository.
TD
On Thu, Dec 18, 2014 at 6:07 AM, Pierce Lamb
wrote:
> I am trying to run stateful Spark Streaming computations over (fake)
> apache web
I am trying to run stateful Spark Streaming computations over (fake)
apache web server logs read from Kafka. The goal is to "sessionize"
the web traffic similar to this blog post:
http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/