Hi, Any hint about getting the location of a particular RDD partition on the cluster? a workaround?
Parallelize method on RDDs partitions the RDD into splits as specified or per as per the default parallelism configuration. Does parallelize actually distribute the partitions into the cluster or the partitions are kept on the driver node. In the first case is there a protocol for assigning/mapping partitions (parallelocollectionpartition) to workers or it is just random. Otherwise, when partitions are distributed on the cluster? Is that when tasks are launched on partitions? thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/location-of-a-partition-in-the-cluster-how-parallelize-method-distribute-the-RDD-partitions-over-the-tp27316.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org