No, partition number is determined by the parameter you set in groupByKey, see http://spark.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions for details, suggest you reading some docs before ask questions
Joe L wrote > I was wonder if groupByKey returns 2 partitions in the below example? > >>>> x = sc.parallelize([("a", 1), ("b", 1), ("a", 1)]) >>>> sorted(x.groupByKey().collect()) > [('a', [1, 1]), ('b', [1])] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupByKey-None-returns-partitions-according-to-the-keys-tp4318p4377.html Sent from the Apache Spark User List mailing list archive at Nabble.com.