Hi, I have a key-value RDD called rdd below. After a groupBy, I tried to count rows. But the result is not unique, somehow non deterministic.
Here is the test code: val step1 = ligneReceipt_cleTable.persist val step2 = step1.groupByKey val s1size = step1.count val s2size = step2.count val t = step2 // rdd after groupBy val t1 = t.count val t2 = t.count val t3 = t.count val t4 = t.count val t5 = t.count val t6 = t.count val t7 = t.count val t8 = t.count println("s1size = " + s1size) println("s2size = " + s2size) println("1 => " + t1) println("2 => " + t2) println("3 => " + t3) println("4 => " + t4) println("5 => " + t5) println("6 => " + t6) println("7 => " + t7) println("8 => " + t8) Here are the results: s1size = 5338864 s2size = 5268001 1 => 5268002 2 => 5268001 3 => 5268001 4 => 5268002 5 => 5268001 6 => 5268002 7 => 5268002 8 => 5268001 Even if the difference is just one row, that's annoying. Any idea ? Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-gives-non-deterministic-results-tp13698.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org