Take a look at spark.sql.adaptive.enabled and the ExchangeCoordinator. A single, fixed-sized sql.shuffle.partitions is not the only way to control the number of partitions in an Exchange -- if you are willing to deal with code that is still off by by default.
On Mon, Nov 14, 2016 at 4:19 PM, leo9r <lezcano....@gmail.com> wrote: > Hi Daniel, > > I completely agree with your request. As the amount of data being processed > with SparkSQL grows, tweaking sql.shuffle.partitions becomes a common need > to prevent OOM and performance degradation. The fact that > sql.shuffle.partitions cannot be set several times in the same job/action, > because of the reason you explain, is a big inconvenient for the > development > of ETL pipelines. > > Have you got any answer or feedback in this regard? > > Thanks, > Leo Lezcano > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Spark-SQL-parameters-like-shuffle- > partitions-should-be-stored-in-the-lineage-tp13240p19867.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >