Hi I am new to Spark Streaming. Can I think of JavaDStream<> foreachRDD() as being 'for each mini batch¹? The java doc does not say much about this function.
Here is the background. I am writing a little test program to figure out how to use streams. At some point I wanted to calculate an aggregate. In my first attempt my driver did a couple of transformations and tries to get the count(). I wrote this code like I would a spark core app and added some logging. My log statements where only executed once, how ever JavaDStream<>print() appears to run on each mini batch. When I put my logging and aggregation code inside foreachRDD() things work as expected my aggegrate and logging appear to be executed on each minibactch I am running on a 4 machine cluster. I create a message with the value of each mini batch aggregate and use logging and System.out. Given the message shows up on my console is it safe to assume that this output code is executing in my driver? The ip shows it is running on my Master. I thought maybe the message is showing up here because I do not have enough data in the steam to force load onto the workers? Any in sites would be greatly appreciated. Andy