AFAIK, the adaptive shuffle partitioning still isn't completely ready to be made the default, and there are some corner issues that need to be addressed before this functionality is declared finished and ready. E.g., the current logic can make data skew problems worse by turning One Big Partition into an even larger partition before the ExchangeCoordinator decides to create a new one. That can be worked around by changing the logic to "If including the nextShuffleInputSize would exceed the target partition size, then start a new partition": https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ExchangeCoordinator.scala#L173
If you're willing to work around those kinds of issues to fit your use case, then I do know that the adaptive shuffle partitioning can be made to work well even if it is not perfect. It would be nice, though, to see adaptive partitioning be finished and hardened to the point where it becomes the default, because a fixed number of shuffle partitions has some significant limitations and problems. On Tue, Nov 15, 2016 at 12:50 AM, leo9r <lezcano....@gmail.com> wrote: > That's great insight Mark, I'm looking forward to give it a try!! > > According to jira's Adaptive execution in Spark > <https://issues.apache.org/jira/browse/SPARK-9850> , it seems that some > functionality was added in Spark 1.6.0 and the rest is still in progress. > Are there any improvements to the SparkSQL adaptive behavior in Spark 2.0+ > that you know? > > Thanks and best regards, > Leo > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Spark-SQL-parameters-like-shuffle- > partitions-should-be-stored-in-the-lineage-tp13240p19885.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >