Where do my partitions go?

losmi83 Thu, 07 Aug 2014 02:21:07 -0700

Hi guys, 

the latest Spark version 1.0.2 exhibits a very strange behavior when it
comes to deciding on which node a given partition should reside. The
following example was tested in the standalone Spark mode.


    val partitioner = new HashPartitioner(10) 
    
    val dummyJob1 = sc.parallelize(0 until 10).map(x =>
(x,x)).partitionBy(partitioner)     
    dummyJob1.foreach { case (id, x) => println("Dummy1 -> Id = " + id) } 

    val dummyJob2 = sc.parallelize(0 until 10).map(x =>
(x,x)).partitionBy(partitioner) 
    dummyJob2.foreach { case (id, x) => println("Dummy2 -> Id = " + id) } 

On one node I get something like: 
    Dummy1 -> Id = 2 
    Dummy2 -> Id = 7 

This is a very strange behavior... One would expect that partitions with the
same ID are placed on the same node -- that is exactly happening with Spark
0.9.2 (and below), but not the latest versions (tested with 1.0.1 and
1.0.2). Can anyone explain this? 

Thanks in advance, 
Milos




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Where-do-my-partitions-go-tp11635.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Where do my partitions go?

Reply via email to