You still have the problem that even within a single Job it is often the case that not every Exchange really wants to use the same number of shuffle partitions.
On Tue, Nov 15, 2016 at 2:46 AM, Sean Owen <so...@cloudera.com> wrote: > Once you get to needing this level of fine-grained control, should you not > consider using the programmatic API in part, to let you control individual > jobs? > > > On Tue, Nov 15, 2016 at 1:19 AM leo9r <lezcano....@gmail.com> wrote: > >> Hi Daniel, >> >> I completely agree with your request. As the amount of data being >> processed >> with SparkSQL grows, tweaking sql.shuffle.partitions becomes a common need >> to prevent OOM and performance degradation. The fact that >> sql.shuffle.partitions cannot be set several times in the same job/action, >> because of the reason you explain, is a big inconvenient for the >> development >> of ETL pipelines. >> >> Have you got any answer or feedback in this regard? >> >> Thanks, >> Leo Lezcano >> >> >> >> -- >> View this message in context: http://apache-spark- >> developers-list.1001551.n3.nabble.com/Spark-SQL-parameters-like-shuffle- >> partitions-should-be-stored-in-the-lineage-tp13240p19867.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>