Re: groupByKey returns a single partition in a RDD?

wxhsdp Tue, 15 Apr 2014 17:36:06 -0700

groupByKey has the numPartitions parameter, you can set it to determine the
partition num. if not set, the generated RDD has the same partition num of
the previous one



Joe L wrote
> I want to apply the following transformations to 60Gbyte data on 7nodes
> with 10Gbyte memory. And I am wondering if groupByKey() function returns a
> RDD with a single partition for each key? if so, what will happen if the
> size of the partition doesn't fit into that particular node? 
> 
> rdd = sc.textFile("hdfs//.....").map(parserFunc).groupByKey()





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/groupByKey-returns-a-single-partition-in-a-RDD-tp4264p4307.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: groupByKey returns a single partition in a RDD?

Reply via email to