Hi guys, the latest Spark version 1.0.2 exhibits a very strange behavior when it comes to deciding on which node a given partition should reside. The following example was tested in the standalone Spark mode.
val partitioner = new HashPartitioner(10) val dummyJob1 = sc.parallelize(0 until 10).map(x => (x,x)).partitionBy(partitioner) dummyJob1.foreach { case (id, x) => println("Dummy1 -> Id = " + id) } val dummyJob2 = sc.parallelize(0 until 10).map(x => (x,x)).partitionBy(partitioner) dummyJob2.foreach { case (id, x) => println("Dummy2 -> Id = " + id) } On one node I get something like: Dummy1 -> Id = 2 Dummy2 -> Id = 7 This is a very strange behavior... One would expect that partitions with the same ID are placed on the same node -- that is exactly happening with Spark 0.9.2 (and below), but not the latest versions (tested with 1.0.1 and 1.0.2). Can anyone explain this? Thanks in advance, Milos -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Where-do-my-partitions-go-tp11635.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org