Hi spark community, I was hoping someone could help me by running a code snippet below in the spark shell, and seeing if they see the same buggy behavior I see. Full details of the bug can be found in this JIRA issue I filed: https://issues.apache.org/jira/browse/SPARK-10942.
The issue was closed due to cannot reproduce, however, I can't seem to shake it. I have worked on this for a while, removing all known variables, and trying different versions of spark (1.5.0, 1.5.1, master), and different OSs (Mac OSX, Debian Linux). My coworkers have tried as well and see the same behavior. This has me convinced that I cannot be the only one in the community to be able to produce this. If you have a minute or two, please open a spark shell and copy/paste the below code. After 30 seconds, check the spark ui, storage tab. If you see some cached RDDs listed, then the bug has been reproduced. If not, then there is no bug... and I may be losing my mind. Thanks in advance! Nick ------------ import org.apache.spark.streaming.{Seconds, StreamingContext} import scala.collection.mutable val ssc = new StreamingContext(sc, Seconds(1)) val inputRDDs = mutable.Queue.tabulate(30) { i => sc.parallelize(Seq(i)) } val input = ssc.queueStream(inputRDDs) val output = input.transform { rdd => if (rdd.isEmpty()) { rdd } else { val rdd2 = rdd.map(identity) rdd2.cache() rdd2.setName(rdd.first().toString) val rdd3 = rdd2.map(identity) ++ rdd2.map(identity) rdd3 } } output.print() ssc.start() -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-needed-to-reproduce-bug-tp24965.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org