Hi, I have two RDD RDD1=K1,V1 RDD2=K1,V1
e.g-(1,List("A","B","C")),(1,List("D","E","F")) RDD1.groupByKey(RDD2) Where K1=Integer V1=List of String If I keep size of V1=3(list of three strings). The groupByKey operation takes 2.6 m and If I keep size of V1=20(list of 20 Strings). The groupByKey operation takes 4.0m How does size of value(V1) impact groupByKey operation. It should be dependent on number of key I tried the same experiment with spark.shuffle.spill=false got similar results. Any idea?? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupByKey-is-taking-more-time-tp3425.html Sent from the Apache Spark User List mailing list archive at Nabble.com.