Re: Iterative transformations over RDD crashes in phantom reduce

2014-11-18 Thread Shannon Quinn
Sorry everyone--turns out an oft-forgotten single line of code was required to make this work: index = 0 INDEX = sc.broadcast(index) M = M.flatMap(func1).reduceByKey(func2) M.foreach(debug_output) *M.cache()* index = 1 INDEX = sc.broadcast(index) M = M.flatMap(func1) M.foreach(debug_output) Wor

Re: Iterative transformations over RDD crashes in phantom reduce

2014-11-18 Thread Shannon Quinn
To clarify about what, precisely, is impossible: the crash happens with INDEX == 1 in func2, but func2 is only called in the reduceByKey transformation when INDEX == 0. And according to the output of the foreach() in line 4, that reduceByKey(func2) works just fine. How is it then invoked again

Iterative transformations over RDD crashes in phantom reduce

2014-11-18 Thread Shannon Quinn
Hi all, This is somewhat related to my previous question ( http://apache-spark-user-list.1001560.n3.nabble.com/Iterative-changes-to-RDD-and-broadcast-variables-tt19042.html , for additional context) but for all practical purposes this is its own issue. As in my previous question, I'm making