Hi, I'm having trouble using both zipWithIndex and repartition. When I use them both, the following action will get stuck and won't return. I'm using spark 1.1.0.
Those 2 lines work as expected: scala> sc.parallelize(1 to 10).repartition(10).count() res0: Long = 10 scala> sc.parallelize(1 to 10).zipWithIndex.count() res1: Long = 10 But this statement get stuck and doesn't return: scala> sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() 14/11/15 03:18:55 INFO spark.SparkContext: Starting job: apply at Option.scala:120 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Got job 3 (apply at Option.scala:120) with 3 output partitions (allowLocal=false) 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Final stage: Stage 4(apply at Option.scala:120) 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Missing parents: List() 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Submitting Stage 4 (ParallelCollectionRDD[7] at parallelize at <console>:13), which has no missing parents 14/11/15 03:18:55 INFO storage.MemoryStore: ensureFreeSpace(1096) called with curMem=7616, maxMem=138938941 14/11/15 03:18:55 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 1096.0 B, free 132.5 MB) Am I doing something wrong here or is it a bug? Is there some work around? Thanks, Lev. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/repartition-combined-with-zipWithIndex-get-stuck-tp18999.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org