I've run into an issue where a global variable used within an
UpdateStateByKey function isn't being assigned after the application
restarts from a checkpoint. Using ForEachRDD I have a global variable 'A'
that is propagated from a file every time a batch runs, and A is then used
in an UpdateStateByKey. When I initially run the application, it functions
as expected and the value of A is referenced correctly within the scope of
the update function.

However, when I bring the application down and restart, I see a different
behavior. Variable A is assigned the correct value by its corresponding
ForEachRDD function, but when the UpdateStateByKey function is executed the
new value for A isn't used. It just... disappears.

I could be going about the implementation of this wrong, but I'm hoping
that someone can point me in the correct direction.

Here's some pseudocode:

def readfile(rdd):

    global A
    a = readFromFile

def update(new, old)

    if old in A:
        do something


dstream.forEachRDD(readfile)
dstream.updateStateByKey(update)

ssc.checkpoint('checkpoint')

A is correct the first time this is run, but when the application is killed
and restarted A doesn't seem to be reassigned correctly.

Reply via email to