Hi,

I'm having trouble using both zipWithIndex and repartition. When I use them
both, the following action will get stuck and won't return.
I'm using spark 1.1.0.


Those 2 lines work as expected:

scala> sc.parallelize(1 to 10).repartition(10).count()
res0: Long = 10

scala> sc.parallelize(1 to 10).zipWithIndex.count()
res1: Long = 10


But this statement get stuck and doesn't return:

scala> sc.parallelize(1 to 10).zipWithIndex.repartition(10).count()
14/11/15 03:18:55 INFO spark.SparkContext: Starting job: apply at
Option.scala:120
14/11/15 03:18:55 INFO scheduler.DAGScheduler: Got job 3 (apply at
Option.scala:120) with 3 output partitions (allowLocal=false)
14/11/15 03:18:55 INFO scheduler.DAGScheduler: Final stage: Stage 4(apply at
Option.scala:120)
14/11/15 03:18:55 INFO scheduler.DAGScheduler: Parents of final stage:
List()
14/11/15 03:18:55 INFO scheduler.DAGScheduler: Missing parents: List()
14/11/15 03:18:55 INFO scheduler.DAGScheduler: Submitting Stage 4
(ParallelCollectionRDD[7] at parallelize at <console>:13), which has no
missing parents
14/11/15 03:18:55 INFO storage.MemoryStore: ensureFreeSpace(1096) called
with curMem=7616, maxMem=138938941
14/11/15 03:18:55 INFO storage.MemoryStore: Block broadcast_4 stored as
values in memory (estimated size 1096.0 B, free 132.5 MB)


Am I doing something wrong here or is it a bug?
Is there some work around?

Thanks,
Lev.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/repartition-combined-with-zipWithIndex-get-stuck-tp18999.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to