Hi,

I have two RDD 
RDD1=K1,V1
RDD2=K1,V1

e.g-(1,List("A","B","C")),(1,List("D","E","F"))

RDD1.groupByKey(RDD2)

Where K1=Integer
V1=List of String

If I keep size of V1=3(list of three strings). The groupByKey operation
takes 2.6 m
and 
If I keep size of V1=20(list of 20 Strings). The groupByKey operation takes
4.0m

How does size of value(V1) impact groupByKey operation. It should be
dependent on number of key 

I tried the same experiment with spark.shuffle.spill=false got similar
results.

Any idea??



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/groupByKey-is-taking-more-time-tp3425.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to