I have a Spark application whose structure is below:
var ts: Long = 0L
dstream1.foreachRDD{
(x, time) => {
ts = time
x.do_something()...
}
}
......
process_data(dstream2, ts, ......)
I assume foreachRDD function call can update "ts" variable which is then
used in the Spark tasks of "process_data" function.
>From my test result of a standalone Spark cluster, it is working. But
should I concern if switch to YARN?
And I saw some articles are recommending to avoid state in Scala
programming. Without the state variable, how could that be done?
Any comments or suggestions are appreciated.
Thanks,
Haopu