Re: [DISCUSS] Spark cannot identify the problem executor

Yi Wu Fri, 11 Sep 2020 06:25:13 -0700

What do you mean by "read from executor A"? I can think of several paths
for an executor to read something from another remote executor:

1. shuffle data
If the executor fails to fetch the shuffle data, I think it will result in
the FetchFiled for the task. For this case, blacklist can identify the
problematic executor A
if spark.blacklist.application.fetchFailure.enabled=true;

2. RDD block
If the executor fails to fetch RDD blocks, I think the task would just do
the computation by itself instead of failing.

3. Broadcast block
If the executor fails to fetch the broadcast block, the task seems to fail
in this case and blacklist doesn't handle it well.

Thanks,
Yi

On Fri, Sep 11, 2020 at 8:43 PM Sean Owen <sro...@gmail.com> wrote:

> -dev, +user
> Executors do not communicate directly, so I don't think that's quite
> what you are seeing. You'd have to clarify.
>
> On Fri, Sep 11, 2020 at 12:08 AM 陈晓宇 <xychen0...@gmail.com> wrote:
> >
> > Hello all,
> >
> > We've been using spark 2.3 with blacklist enabled and  often meet the
> problem that when executor A has some problem(like connection issue). Tasks
> on executor B, executor C will fail saying cannot read from executor A.
> Finally the job will fail due to task on executor B failed 4 times.
> >
> > I wonder whether there is any existing fix or discussions how to
> identify Executor A as the problem node.
> >
> > Thanks
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [DISCUSS] Spark cannot identify the problem executor

Reply via email to