Hello, there is another link to discuss the difference between the two methods.
Https://stackoverflow.com/questions/33831561/pyspark-repartition-vs-partitionby In particular, when you are faced with possible data skew or have some partitioned parameters that need to be obtained at runtime, you can refer to this link and hope to help you. Https://software.intel.com/en-us/articles/spark-sql-adaptive-execution-at-100-tb -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org