Re: Spark-SQL parameters like shuffle.partitions should be stored in the lineage

Mark Hamstra Tue, 15 Nov 2016 10:42:33 -0800

You still have the problem that even within a single Job it is often the
case that not every Exchange really wants to use the same number of shuffle
partitions.


On Tue, Nov 15, 2016 at 2:46 AM, Sean Owen <so...@cloudera.com> wrote:

> Once you get to needing this level of fine-grained control, should you not
> consider using the programmatic API in part, to let you control individual
> jobs?
>
>
> On Tue, Nov 15, 2016 at 1:19 AM leo9r <lezcano....@gmail.com> wrote:
>
>> Hi Daniel,
>>
>> I completely agree with your request. As the amount of data being
>> processed
>> with SparkSQL grows, tweaking sql.shuffle.partitions becomes a common need
>> to prevent OOM and performance degradation. The fact that
>> sql.shuffle.partitions cannot be set several times in the same job/action,
>> because of the reason you explain, is a big inconvenient for the
>> development
>> of ETL pipelines.
>>
>> Have you got any answer or feedback in this regard?
>>
>> Thanks,
>> Leo Lezcano
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-
>> developers-list.1001551.n3.nabble.com/Spark-SQL-parameters-like-shuffle-
>> partitions-should-be-stored-in-the-lineage-tp13240p19867.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Re: Spark-SQL parameters like shuffle.partitions should be stored in the lineage

Reply via email to