[
https://issues.apache.org/jira/browse/SPARK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667665#comment-15667665
]
Imran Rashid commented on SPARK-9850:
-------------------------------------
[~assaf.mendelson] reducers already have to wait for the last mapper to finish.
Spark has always behaved this way. (I think you might find discussions
referring to this as the "stage barrier"). I don't see that changing anytime
soon -- while its not ideal, doing away with that would a lot of complexity.
> Adaptive execution in Spark
> ---------------------------
>
> Key: SPARK-9850
> URL: https://issues.apache.org/jira/browse/SPARK-9850
> Project: Spark
> Issue Type: Epic
> Components: Spark Core, SQL
> Reporter: Matei Zaharia
> Assignee: Yin Huai
> Attachments: AdaptiveExecutionInSpark.pdf
>
>
> Query planning is one of the main factors in high performance, but the
> current Spark engine requires the execution DAG for a job to be set in
> advance. Even with costÂ-based optimization, it is hard to know the behavior
> of data and user-defined functions well enough to always get great execution
> plans. This JIRA proposes to add adaptive query execution, so that the engine
> can change the plan for each query as it sees what data earlier stages
> produced.
> We propose adding this to Spark SQL / DataFrames first, using a new API in
> the Spark engine that lets libraries run DAGs adaptively. In future JIRAs,
> the functionality could be extended to other libraries or the RDD API, but
> that is more difficult than adding it in SQL.
> I've attached a design doc by Yin Huai and myself explaining how it would
> work in more detail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]