[Spark Streaming] Help with updateStateByKey()

2015-04-23 Thread allonsy
ew this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Help-with-updateStateByKey-tp22637.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscrib

Re: Help with updateStateByKey

2014-12-18 Thread Silvio Fiorito
e returns None. >> >> Instead, try: >> >> Some(currentValue.getOrElse(Seq.empty) ++ newValues) >> >> I think that should give you the expected result. >> >> >> From: Pierce Lamb >> Date: Thursday, December 18, 2014 at 2:31 PM >>

Re: Help with updateStateByKey

2014-12-18 Thread Pierce Lamb
returns None. > > Instead, try: > > Some(currentValue.getOrElse(Seq.empty) ++ newValues) > > I think that should give you the expected result. > > > From: Pierce Lamb > Date: Thursday, December 18, 2014 at 2:31 PM > To: Silvio Fiorito > Cc: "user@spark.a

Re: Help with updateStateByKey

2014-12-18 Thread Silvio Fiorito
expected result. From: Pierce Lamb mailto:richard.pierce.l...@gmail.com>> Date: Thursday, December 18, 2014 at 2:31 PM To: Silvio Fiorito mailto:silvio.fior...@granturing.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.

Re: Help with updateStateByKey

2014-12-18 Thread Pierce Lamb
Hi Silvio, This is a great suggestion (I wanted to get rid of groupByKey), I have been trying to implement it this morning, but having some trouble. I would love to see your code for the function that goes inside updateStateByKey Here is my current code: def updateGroupByKey( newValues: Seq[(St

Re: Help with updateStateByKey

2014-12-18 Thread Silvio Fiorito
Hi Pierce, You shouldn’t have to use groupByKey because updateStateByKey will get a Seq of all the values for that key already. I used that for realtime sessionization as well. What I did was key my incoming events, then send them to udpateStateByKey. The updateStateByKey function then receive

Re: Help with updateStateByKey

2014-12-18 Thread Tathagata Das
Another point to start playing with updateStateByKey is the example StatefulNetworkWordCount. See the streaming examples directory in the Spark repository. TD On Thu, Dec 18, 2014 at 6:07 AM, Pierce Lamb wrote: > I am trying to run stateful Spark Streaming computations over (fake) > apache web

Help with updateStateByKey

2014-12-17 Thread Pierce Lamb
I am trying to run stateful Spark Streaming computations over (fake) apache web server logs read from Kafka. The goal is to "sessionize" the web traffic similar to this blog post: http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/