Re: ShuffleManager and Speculative Execution

Mich Talebzadeh Thu, 21 Dec 2023 11:39:58 -0800

Interesting point.

As I understand, the key point is the ShuffleManager ensures that only one
map output file is processed by the reduce task, even when multiple
attempts succeed. So it is not a random selection process. At the reduce
stage, only one copy of the map output needs to be read by the reduce task.
Now which copies, if I am correct, much like other classical examples,
Spark prioritizes the copy that completes first (FIFO). The first completed
instance output will be used, and the output from the other speculative
instances will be ignored. This makes sense as the reduce stage can proceed
with the earliest available data, minimizing the impact of speculative
execution on job completion time which is another important factor.

HTH

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

 https://en.everybodywiki.com/Mich_Talebzadeh

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Thu, 21 Dec 2023 at 17:51, Enrico Minack <i...@enrico.minack.dev> wrote:

> Hi Spark devs,
>
> I have a question around ShuffleManager: With speculative execution, one
> map output file is being created multiple times (by multiple task
> attempts). If both attempts succeed, which is to be read by the reduce
> task in the next stage? Is any map output as good as any other?
>
> Thanks for clarification,
> Enrico
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: ShuffleManager and Speculative Execution

Reply via email to