Re: Timed aggregation in Spark

2016-05-23 Thread Ofir Kerker
Yes, check out mapWithState:https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html _ From: Nikhil Goyal Sent: Monday, May 23, 2016 23:28 Subject: Timed aggregation in Spark To: Hi all, I want to aggregate my dat

Re: mapWithState not compacting removed state

2016-04-07 Thread Ofir Kerker
Hi Iain, Did you manage to solve this issue? It looks like we have a similar issue with processing time increasing every micro-batch but only after 30 batches. Thanks. On Thu, Mar 3, 2016 at 4:45 PM Iain Cundy wrote: > Hi All > > > > I’m aggregating data using mapWithState with a timeout set in

Re: Spark Streaming application code change and stateful transformations

2015-09-16 Thread Ofir Kerker
, 2015 at 22:14 Cody Koeninger wrote: > Solution 2 sounds better to me. You aren't always going to have graceful > shutdowns. > > On Mon, Sep 14, 2015 at 1:49 PM, Ofir Kerker > wrote: > >> Hi, >> My Spark Streaming application consumes messages (events) from

Spark Streaming application code change and stateful transformations

2015-09-14 Thread Ofir Kerker
Hi, My Spark Streaming application consumes messages (events) from Kafka every 10 seconds using the direct stream approach and aggregates these messages into hourly aggregations (to answer analytics questions like: "How many users from Paris visited page X between 8PM to 9PM") and save the data to