Re: Spark-SQL parameters like shuffle.partitions should be stored in the lineage

Mark Hamstra Mon, 14 Nov 2016 19:27:30 -0800

Take a look at spark.sql.adaptive.enabled and the ExchangeCoordinator.  A
single, fixed-sized sql.shuffle.partitions is not the only way to control
the number of partitions in an Exchange -- if you are willing to deal with
code that is still off by by default.


On Mon, Nov 14, 2016 at 4:19 PM, leo9r <lezcano....@gmail.com> wrote:

> Hi Daniel,
>
> I completely agree with your request. As the amount of data being processed
> with SparkSQL grows, tweaking sql.shuffle.partitions becomes a common need
> to prevent OOM and performance degradation. The fact that
> sql.shuffle.partitions cannot be set several times in the same job/action,
> because of the reason you explain, is a big inconvenient for the
> development
> of ETL pipelines.
>
> Have you got any answer or feedback in this regard?
>
> Thanks,
> Leo Lezcano
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Spark-SQL-parameters-like-shuffle-
> partitions-should-be-stored-in-the-lineage-tp13240p19867.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Spark-SQL parameters like shuffle.partitions should be stored in the lineage

Reply via email to