PySpark Checkpoints with Broadcast Variables

2015-09-29 Thread Jason White
ssc.awaitTermination() The error I inevitably get when restoring from the checkpoint is: Exception: (Exception("Broadcast variable '3' not loaded!",), , (3L,)) Has anyone had any luck checkpointing in PySpark with a broadcast variable? -- View this message in c

PySpark Checkpoints with Broadcast Variables

2015-09-29 Thread Jason White
I'm having trouble loading a streaming job from a checkpoint when a broadcast variable is defined. I've seen the solution by TD in Scala ( https://issues.apache.org/jira/browse/SPARK-5206) that uses a singleton to get/create an accumulator, but I can't seem to get it to work in PySpark with a broad