location of a partition in the cluster/ how parallelize method distribute the RDD partitions over the cluster.

Mazen Sun, 10 Jul 2016 05:59:23 -0700

Hi, 

Any hint about getting the location of a particular RDD partition on the
cluster? a workaround?



Parallelize method on RDDs partitions the RDD into splits  as specified or 
per as per the  default parallelism configuration. Does parallelize actually
distribute the partitions into the cluster or the partitions are kept on the
driver node. In the first case is there a protocol for assigning/mapping
partitions (parallelocollectionpartition) to workers or it is just random.
Otherwise, when partitions are distributed on the cluster? Is that when 
tasks are launched on partitions?

thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/location-of-a-partition-in-the-cluster-how-parallelize-method-distribute-the-RDD-partitions-over-the-tp27316.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

location of a partition in the cluster/ how parallelize method distribute the RDD partitions over the cluster.

Reply via email to