hi!
It is because of "spark.sql.shuffle.partitions". See the value 200 in the
physical plan at the rangepartitioning:
scala> val df = sc.parallelize(1 to 1000, 10).toDF("v").sort("v")
df: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [v: int]
scala> df.explain()
== Physical Plan ==
*(2) Sort [v#300 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(v#300 ASC NULLS FIRST, 200), true, [id=#334]
+- *(1) Project [value#297 AS v#300]
+- *(1) SerializeFromObject [input[0, int, false] AS value#297]
+- Scan[obj#296]
scala> df.rdd.getNumPartitions
res13: Int = 200
Best Regards,
Attila
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]