What do you mean by "read from executor A"? I can think of several paths for an executor to read something from another remote executor:
1. shuffle data If the executor fails to fetch the shuffle data, I think it will result in the FetchFiled for the task. For this case, blacklist can identify the problematic executor A if spark.blacklist.application.fetchFailure.enabled=true; 2. RDD block If the executor fails to fetch RDD blocks, I think the task would just do the computation by itself instead of failing. 3. Broadcast block If the executor fails to fetch the broadcast block, the task seems to fail in this case and blacklist doesn't handle it well. Thanks, Yi On Fri, Sep 11, 2020 at 8:43 PM Sean Owen <sro...@gmail.com> wrote: > -dev, +user > Executors do not communicate directly, so I don't think that's quite > what you are seeing. You'd have to clarify. > > On Fri, Sep 11, 2020 at 12:08 AM 陈晓宇 <xychen0...@gmail.com> wrote: > > > > Hello all, > > > > We've been using spark 2.3 with blacklist enabled and often meet the > problem that when executor A has some problem(like connection issue). Tasks > on executor B, executor C will fail saying cannot read from executor A. > Finally the job will fail due to task on executor B failed 4 times. > > > > I wonder whether there is any existing fix or discussions how to > identify Executor A as the problem node. > > > > Thanks > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >