Sorry everyone--turns out an oft-forgotten single line of code was
required to make this work:
index = 0
INDEX = sc.broadcast(index)
M = M.flatMap(func1).reduceByKey(func2)
M.foreach(debug_output)
*M.cache()*
index = 1
INDEX = sc.broadcast(index)
M = M.flatMap(func1)
M.foreach(debug_output)
Wor
To clarify about what, precisely, is impossible: the crash happens with
INDEX == 1 in func2, but func2 is only called in the reduceByKey
transformation when INDEX == 0. And according to the output of the
foreach() in line 4, that reduceByKey(func2) works just fine. How is it
then invoked again
Hi all,
This is somewhat related to my previous question (
http://apache-spark-user-list.1001560.n3.nabble.com/Iterative-changes-to-RDD-and-broadcast-variables-tt19042.html
, for additional context) but for all practical purposes this is its own
issue.
As in my previous question, I'm making