I am actually looking for a way in spark streaming to keep updating an accumulator during the application life cycle.
Actually, my use case is like the following: My spark streaming application is consuming events from Kafka. The event consists on a userID and some other information. On the other hand I have accumulator like the following: /val ssc = new StreamingContext(conf, Seconds(3)) val userAccu = ssc.sparkContext.accumulableCollection(mutable.HashMap[String,Long]()) / Then I apply all my transformations: /KafkaStream.map(_._2).map{...}.foreachRDD(foreachFunc = rdd => {...})/ If I capture a new userID, then I will update my userAccu like this: /userAccu += (userID -> someValue)/ This line of code exists within the foreachRDD transformation. Everything works fine up to now. However, During the idle DStreams (every 3 seconds as mentionned in ssc), I want also to update and make some transformations on my userAccu Accumulator. For example, I want to run something like this each 3 seconds: /userAccu.value.update(userID, someValue)/ Is this doable? I want to keep updating my HashMap even without getting RDDs from Kafka. It should be independant from Kafka.. Alternatively, does this have something to do with /updateStateByKey/ method ? Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkStreaming-Updating-internal-variables-without-catching-events-from-Kafka-tp25552.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org