Re: Spark-SQL parameters like shuffle.partitions should be stored in the lineage

Mark Hamstra Tue, 15 Nov 2016 10:39:10 -0800

AFAIK, the adaptive shuffle partitioning still isn't completely ready to be
made the default, and there are some corner issues that need to be
addressed before this functionality is declared finished and ready.  E.g.,
the current logic can make data skew problems worse by turning One Big
Partition into an even larger partition before the ExchangeCoordinator
decides to create a new one.  That can be worked around by changing the
logic to "If including the nextShuffleInputSize would exceed the target
partition size, then start a new partition":
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ExchangeCoordinator.scala#L173

If you're willing to work around those kinds of issues to fit your use
case, then I do know that the adaptive shuffle partitioning can be made to
work well even if it is not perfect.  It would be nice, though, to see
adaptive partitioning be finished and hardened to the point where it
becomes the default, because a fixed number of shuffle partitions has some
significant limitations and problems.

On Tue, Nov 15, 2016 at 12:50 AM, leo9r <lezcano....@gmail.com> wrote:

> That's great insight Mark, I'm looking forward to give it a try!!
>
> According to jira's  Adaptive execution in Spark
> <https://issues.apache.org/jira/browse/SPARK-9850>  , it seems that some
> functionality was added in Spark 1.6.0 and the rest is still in progress.
> Are there any improvements to the SparkSQL adaptive behavior in Spark 2.0+
> that you know?
>
> Thanks and best regards,
> Leo
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Spark-SQL-parameters-like-shuffle-
> partitions-should-be-stored-in-the-lineage-tp13240p19885.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Spark-SQL parameters like shuffle.partitions should be stored in the lineage

Reply via email to