SparkStreaming: Updating internal variables without catching events from Kafka

AliGouta Wed, 02 Dec 2015 13:23:01 -0800

I am actually looking for a way in spark streaming to keep updating an
accumulator during the application life cycle.


Actually, my use case is like the following: My spark streaming application
is consuming events from Kafka. The event consists on a userID and some
other information. On the other hand I have accumulator like the following:

/val ssc = new StreamingContext(conf, Seconds(3))
val userAccu =
ssc.sparkContext.accumulableCollection(mutable.HashMap[String,Long]())
/

Then I apply all my transformations:

/KafkaStream.map(_._2).map{...}.foreachRDD(foreachFunc = rdd => {...})/

If I capture a new userID, then I will update my userAccu like this:

/userAccu += (userID -> someValue)/


This line of code exists within the foreachRDD transformation. Everything
works fine up to now. However, During the idle DStreams (every 3 seconds as
mentionned in ssc), I want also to update and make some transformations on
my userAccu Accumulator. For example, I want to run something like this each
3 seconds:

/userAccu.value.update(userID, someValue)/


Is this doable? I want to keep updating my HashMap even without getting RDDs
from Kafka. It should be independant from Kafka..

Alternatively, does this have something to do with /updateStateByKey/ method
?

Thank you.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkStreaming-Updating-internal-variables-without-catching-events-from-Kafka-tp25552.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

SparkStreaming: Updating internal variables without catching events from Kafka

Reply via email to