Re: [DISCUSS] FLIP-224: Blacklist Mechanism

Lijie Wang Mon, 16 May 2022 08:22:47 -0700

Hi Konstantin,

Maybe change it to the following:


1. POST: http://{jm_rest_address:port}/blocklist/taskmanagers/{id}
Merge is not allowed. If the {id} already exists, return error. Otherwise,
create a new item.

2. POST: http://{jm_rest_address:port}/blocklist/taskmanagers/{id}:merge
Merge is allowed. If the {id} already exists, merge. Otherwise, create a
new item.

WDYT?

Best,
Lijie

Konstantin Knauf <[email protected]> 于2022年5月16日周一 20:07写道：

> Hi Lijie,
>
> hm, maybe the following is more appropriate in that case
>
> POST: http://{jm_rest_address:port}/blocklist/taskmanagers/{id}:merge
>
> Best,
>
> Konstantin
>
> Am Mo., 16. Mai 2022 um 07:05 Uhr schrieb Lijie Wang <
> [email protected]>:
>
> > Hi Konstantin,
> > thanks for your feedback.
> >
> > From what I understand, PUT should be idempotent. However, we have a
> > *timeout* field in the request. This means that initiating the same
> request
> > at two different times will lead to different resource status (timestamps
> > of the items to be removed will be different).
> >
> > Should we use PUT in this case? WDYT?
> >
> > Best,
> > Lijie
> >
> > Konstantin Knauf <[email protected]> 于2022年5月13日周五 17:20写道：
> >
> > > Hi Lijie,
> > >
> > > wouldn't the REST API-idiomatic way for an update/replace be a PUT on
> the
> > > resource?
> > >
> > > PUT: http://{jm_rest_address:port}/blocklist/taskmanagers/{id}
> > >
> > > Best,
> > >
> > > Konstantin
> > >
> > >
> > >
> > > Am Fr., 13. Mai 2022 um 11:01 Uhr schrieb Lijie Wang <
> > > [email protected]>:
> > >
> > > > Hi everyone,
> > > >
> > > > I've had an offline discussion with Becket Qin and Zhu Zhu, and made
> > the
> > > > following changes on REST API:
> > > > 1. To avoid ambiguity, *timeout* and *endTimestamp* can only choose
> > one.
> > > If
> > > > both are specified, will return error.
> > > > 2.  If the specified item is already there, the *ADD* operation has
> two
> > > > behaviors:  *return error*(default value) or *merge/update*, and we
> > add a
> > > > flag to the request body to control it. You can find more details
> > "Public
> > > > Interface" section.
> > > >
> > > > If there is no more feedback, we will start the vote thread next
> week.
> > > >
> > > > Best,
> > > > Lijie
> > > >
> > > > Lijie Wang <[email protected]> 于2022年5月10日周二 17:14写道：
> > > >
> > > > > Hi Becket Qin,
> > > > >
> > > > > Thanks for your suggestions.  I have moved the description of
> > > > > configurations, metrics and REST API into "Public Interface"
> section,
> > > and
> > > > > made a few updates according to your suggestion.  And in this FLIP,
> > > there
> > > > > no public java Interfaces or pluggables that users need to
> implement
> > by
> > > > > themselves.
> > > > >
> > > > > Answers for you questions:
> > > > > 1. Yes, there 2 block actions: MARK_BLOCKED and.
> > > > > MARK_BLOCKED_AND_EVACUATE_TASKS (has renamed). Currently, block
> items
> > > can
> > > > > only be added through the REST API, so these 2 action are mentioned
> > in
> > > > the
> > > > > REST API part (The REST API part has beed moved to public interface
> > > now).
> > > > > 2. I agree with you. I have changed the "Cause" field to String,
> and
> > > > allow
> > > > > users to specify it via REST API.
> > > > > 3. Yes, it is useful to allow different timeouts. As mentioned
> above,
> > > we
> > > > > will introduce 2 fields : *timeout* and *endTimestamp* into the ADD
> > > REST
> > > > > API to specify when to remove the blocked item. These 2 fields are
> > > > > optional, if neither is specified, it means that the blocked item
> is
> > > > > permanent and will not be removed. If both are specified, the
> minimum
> > > of
> > > > > *currentTimestamp+tiemout *and* endTimestamp* will be used as the
> > time
> > > to
> > > > > remove the blocked item. To keep the configurations more minimal,
> we
> > > have
> > > > > removed the *cluster.resource-blocklist.item.timeout* configuration
> > > > > option.
> > > > > 4. Yes, the block item will be overridden if the specified item
> > already
> > > > > exists. The ADD operation is *ADD or UPDATE*.
> > > > > 5. Yes. On JM/RM side, all the blocklist information is maintained
> in
> > > > > JMBlocklistHandler/RMBlocklistHandler. The blocklist handler(or
> > > > abstracted
> > > > > to other interfaces) will be propagated to different components.
> > > > >
> > > > > Best,
> > > > > Lijie
> > > > >
> > > > > Becket Qin <[email protected]> 于2022年5月10日周二 11:26写道：
> > > > >
> > > > >> Hi Lijie,
> > > > >>
> > > > >> Thanks for updating the FLIP. It looks like the public interface
> > > section
> > > > >> did not fully reflect all the user sensible behavior and API. Can
> > you
> > > > put
> > > > >> everything that users may be aware of there? That would include
> the
> > > REST
> > > > >> API, metrics, configurations, public java Interfaces or pluggables
> > > that
> > > > >> users may see or implement by themselves, as well as a brief
> summary
> > > of
> > > > >> the
> > > > >> behavior of the public API.
> > > > >>
> > > > >> Besides that, I have a few questions:
> > > > >>
> > > > >> 1. According to the conversation in the discussion thread, it
> looks
> > > like
> > > > >> the BlockAction will have "MARK_BLOCKLISTED" and
> > > > >> "MARK_BLOCKLISTED_AND_EVACUATE_TASKS". Is that the case? If so,
> can
> > > you
> > > > >> add
> > > > >> that to the public interface as well?
> > > > >>
> > > > >> 2. At this point, the "Cause" field in the BlockingItem is a
> > Throwable
> > > > and
> > > > >> is not reflected in the REST API. Should that be included in the
> > query
> > > > >> response? And should we change that field to be a String so users
> > may
> > > > >> specify the cause via the REST API when they block some nodes /
> TMs?
> > > > >>
> > > > >> 3. Would it be useful to allow users to have different timeouts
> for
> > > > >> different blocked items? So while there is a default timeout,
> users
> > > can
> > > > >> also override it via the REST API when they block an entity.
> > > > >>
> > > > >> 4. Regarding the ADD operation, if the specified item is already
> > > there,
> > > > >> will the block item be overridden? For example, if the user wants
> to
> > > > >> extend
> > > > >> the timeout of a blocked item, can they just  issue an ADD command
> > > > again?
> > > > >>
> > > > >> 5. I am not quite familiar with the details of this, but is there
> a
> > > > source
> > > > >> of truth for the blocked list? I think it might be good to have a
> > > single
> > > > >> source of truth for the blocked list and just propagate that list
> to
> > > > >> different components to take the action of actually blocking the
> > > > resource.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Jiangjie (Becket) Qin
> > > > >>
> > > > >> On Mon, May 9, 2022 at 5:54 PM Lijie Wang <
> [email protected]
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi everyone,
> > > > >> >
> > > > >> > Based on the discussion in the mailing list, I updated the FLIP
> > doc,
> > > > the
> > > > >> > changes include:
> > > > >> > 1. Changed the description of the motivation section to more
> > clearly
> > > > >> > describe the problem this FLIP is trying to solve.
> > > > >> > 2. Only  *Manually* is supported.
> > > > >> > 3. Adopted some suggestions, such as *endTimestamp*.
> > > > >> >
> > > > >> > Best,
> > > > >> > Lijie
> > > > >> >
> > > > >> >
> > > > >> > Roman Boyko <[email protected]> 于2022年5月7日周六 19:25写道：
> > > > >> >
> > > > >> > > Hi Lijie!
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > *a) “Probably storing inside Zookeeper/Configmap might be
> > > > >> helpfulhere.”
> > > > >> > > Can you explain it in detail? I don't fully understand that.
> In
> > > > >> > myopinion,
> > > > >> > > non-active and active are the same, and no special treatment
> > > > >> isrequired.*
> > > > >> > >
> > > > >> > > Sorry this was a misunderstanding from my side. I thought we
> > were
> > > > >> talking
> > > > >> > > about the HA mode (but not about Active and Standalone
> > > > >> ResourceManager).
> > > > >> > > And the original question was - how to handle the blacklisted
> > > nodes
> > > > >> list
> > > > >> > at
> > > > >> > > the moment of leader change? Should we simply forget about
> them
> > or
> > > > >> try to
> > > > >> > > pre-save that list on the remote storage?
> > > > >> > >
> > > > >> > > On Sat, 7 May 2022 at 10:51, Yang Wang <[email protected]
> >
> > > > wrote:
> > > > >> > >
> > > > >> > > > Thanks Lijie and ZhuZhu for the explanation.
> > > > >> > > >
> > > > >> > > > I just overlooked the "MARK_BLOCKLISTED". For tasks level,
> it
> > is
> > > > >> indeed
> > > > >> > > > some functionalities the external tools(e.g. kubectl taint)
> > > could
> > > > >> not
> > > > >> > > > support.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Yang
> > > > >> > > >
> > > > >> > > > Lijie Wang <[email protected]> 于2022年5月6日周五 22:18写道：
> > > > >> > > >
> > > > >> > > > > Thanks for your feedback, Jiangang and Martijn.
> > > > >> > > > >
> > > > >> > > > > @Jiangang
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > > For auto-detecting, I wonder how to make the strategy
> and
> > > > mark a
> > > > >> > node
> > > > >> > > > > blocked?
> > > > >> > > > >
> > > > >> > > > > In fact, we currently plan to not support auto-detection
> in
> > > this
> > > > >> > FLIP.
> > > > >> > > > The
> > > > >> > > > > part about auto-detection may be continued in a separate
> > FLIP
> > > in
> > > > >> the
> > > > >> > > > > future. Some guys have the same concerns as you, and the
> > > > >> correctness
> > > > >> > > and
> > > > >> > > > > necessity of auto-detection may require further discussion
> > in
> > > > the
> > > > >> > > future.
> > > > >> > > > >
> > > > >> > > > > > In session mode, multi jobs can fail on the same bad
> node
> > > and
> > > > >> the
> > > > >> > > node
> > > > >> > > > > should be marked blocked.
> > > > >> > > > > By design, the blocklist information will be shared among
> > all
> > > > jobs
> > > > >> > in a
> > > > >> > > > > cluster/session. The JM will sync blocklist information
> with
> > > RM.
> > > > >> > > > >
> > > > >> > > > > @Martijn
> > > > >> > > > >
> > > > >> > > > > > I agree with Yang Wang on this.
> > > > >> > > > > As Zhu Zhu and I mentioned above, we think the
> > > > >> MARK_BLOCKLISTED(Just
> > > > >> > > > limits
> > > > >> > > > > the load of the node and does not  kill all the processes
> on
> > > it)
> > > > >> is
> > > > >> > > also
> > > > >> > > > > important, and we think that external systems (*yarn
> rmadmin
> > > or
> > > > >> > kubectl
> > > > >> > > > > taint*) cannot support it. So we think it makes sense even
> > > only
> > > > >> > > > *manually*.
> > > > >> > > > >
> > > > >> > > > > > I also agree with Chesnay that magical mechanisms are
> > indeed
> > > > >> super
> > > > >> > > hard
> > > > >> > > > > to get right.
> > > > >> > > > > Yes, as you see, Jiangang(and a few others) have the same
> > > > concern.
> > > > >> > > > > However, we currently plan to not support auto-detection
> in
> > > this
> > > > >> > FLIP,
> > > > >> > > > and
> > > > >> > > > > only *manually*. In addition, I'd like to say that the
> FLIP
> > > > >> provides
> > > > >> > a
> > > > >> > > > > mechanism to support MARK_BLOCKLISTED and
> > > > >> > > > > MARK_BLOCKLISTED_AND_EVACUATE_TASKS,
> > > > >> > > > > the auto-detection may be done by external systems.
> > > > >> > > > >
> > > > >> > > > > Best,
> > > > >> > > > > Lijie
> > > > >> > > > >
> > > > >> > > > > Martijn Visser <[email protected]> 于2022年5月6日周五
> > 19:04写道：
> > > > >> > > > >
> > > > >> > > > > > > If we only support to block nodes manually, then I
> could
> > > not
> > > > >> see
> > > > >> > > > > > the obvious advantages compared with current SRE's
> > > > approach(via
> > > > >> > *yarn
> > > > >> > > > > > rmadmin or kubectl taint*).
> > > > >> > > > > >
> > > > >> > > > > > I agree with Yang Wang on this.
> > > > >> > > > > >
> > > > >> > > > > > >  To me this sounds yet again like one of those magical
> > > > >> mechanisms
> > > > >> > > > that
> > > > >> > > > > > will rarely work just right.
> > > > >> > > > > >
> > > > >> > > > > > I also agree with Chesnay that magical mechanisms are
> > indeed
> > > > >> super
> > > > >> > > hard
> > > > >> > > > > to
> > > > >> > > > > > get right.
> > > > >> > > > > >
> > > > >> > > > > > Best regards,
> > > > >> > > > > >
> > > > >> > > > > > Martijn
> > > > >> > > > > >
> > > > >> > > > > > On Fri, 6 May 2022 at 12:03, Jiangang Liu <
> > > > >> > [email protected]
> > > > >> > > >
> > > > >> > > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > >> Thanks for the valuable design. The auto-detecting can
> > > > decrease
> > > > >> > > great
> > > > >> > > > > work
> > > > >> > > > > >> for us. We have implemented the similar feature in our
> > > inner
> > > > >> flink
> > > > >> > > > > >> version.
> > > > >> > > > > >> Below is something that I care about:
> > > > >> > > > > >>
> > > > >> > > > > >>    1. For auto-detecting, I wonder how to make the
> > strategy
> > > > and
> > > > >> > > mark a
> > > > >> > > > > >> node
> > > > >> > > > > >>    blocked? Sometimes the blocked node is hard to be
> > > > detected,
> > > > >> for
> > > > >> > > > > >> example,
> > > > >> > > > > >>    the upper node or the down node will be blocked when
> > > > network
> > > > >> > > > > >> unreachable.
> > > > >> > > > > >>    2. I see that the strategy is made in JobMaster
> side.
> > > How
> > > > >> about
> > > > >> > > > > >>    implementing the similar logic in resource manager?
> In
> > > > >> session
> > > > >> > > > mode,
> > > > >> > > > > >> multi
> > > > >> > > > > >>    jobs can fail on the same bad node and the node
> should
> > > be
> > > > >> > marked
> > > > >> > > > > >> blocked.
> > > > >> > > > > >>    If the job makes the strategy, the node may be not
> > > marked
> > > > >> > blocked
> > > > >> > > > if
> > > > >> > > > > >> the
> > > > >> > > > > >>    fail times don't exceed the threshold.
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > > > >> Zhu Zhu <[email protected]> 于2022年5月5日周四 23:35写道：
> > > > >> > > > > >>
> > > > >> > > > > >> > Thank you for all your feedback!
> > > > >> > > > > >> >
> > > > >> > > > > >> > Besides the answers from Lijie, I'd like to share
> some
> > of
> > > > my
> > > > >> > > > thoughts:
> > > > >> > > > > >> > 1. Whether to enable automatical blocklist
> > > > >> > > > > >> > Generally speaking, it is not a goal of FLIP-224.
> > > > >> > > > > >> > The automatical way should be something built upon
> the
> > > > >> blocklist
> > > > >> > > > > >> > mechanism and well decoupled. It was designed to be a
> > > > >> > configurable
> > > > >> > > > > >> > blocklist strategy, but I think we can further
> decouple
> > > it
> > > > by
> > > > >> > > > > >> > introducing a abnormal node detector, as Becket
> > > suggested,
> > > > >> which
> > > > >> > > > just
> > > > >> > > > > >> > uses the blocklist mechanism once bad nodes are
> > detected.
> > > > >> > However,
> > > > >> > > > it
> > > > >> > > > > >> > should be a separate FLIP with further dev
> discussions
> > > and
> > > > >> > > feedback
> > > > >> > > > > >> > from users. I also agree with Becket that different
> > users
> > > > >> have
> > > > >> > > > > different
> > > > >> > > > > >> > requirements, and we should listen to them.
> > > > >> > > > > >> >
> > > > >> > > > > >> > 2. Is it enough to just take away abnormal nodes
> > > externally
> > > > >> > > > > >> > My answer is no. As Lijie has mentioned, we need a
> way
> > to
> > > > >> avoid
> > > > >> > > > > >> > deploying tasks to temporary hot nodes. In this case,
> > > users
> > > > >> may
> > > > >> > > just
> > > > >> > > > > >> > want to limit the load of the node and do not want to
> > > kill
> > > > >> all
> > > > >> > the
> > > > >> > > > > >> > processes on it. Another case is the speculative
> > > > execution[1]
> > > > >> > > which
> > > > >> > > > > >> > may also leverage this feature to avoid starting
> mirror
> > > > >> tasks on
> > > > >> > > > slow
> > > > >> > > > > >> > nodes.
> > > > >> > > > > >> >
> > > > >> > > > > >> > Thanks,
> > > > >> > > > > >> > Zhu
> > > > >> > > > > >> >
> > > > >> > > > > >> > [1]
> > > > >> > > > > >> >
> > > > >> > > > > >>
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+execution+for+Batch+Job
> > > > >> > > > > >> >
> > > > >> > > > > >> > Lijie Wang <[email protected]> 于2022年5月5日周四
> > > > 15:56写道：
> > > > >> > > > > >> >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Hi everyone,
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Thanks for your feedback.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > There's one detail that I'd like to re-emphasize
> here
> > > > >> because
> > > > >> > it
> > > > >> > > > can
> > > > >> > > > > >> > affect the value and design of the blocklist
> mechanism
> > > > >> (perhaps
> > > > >> > I
> > > > >> > > > > should
> > > > >> > > > > >> > highlight it in the FLIP). We propose two actions in
> > > FLIP:
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > 1) MARK_BLOCKLISTED: Just mark the task manager or
> > node
> > > > as
> > > > >> > > > blocked.
> > > > >> > > > > >> > Future slots should not be allocated from the blocked
> > > task
> > > > >> > manager
> > > > >> > > > or
> > > > >> > > > > >> node.
> > > > >> > > > > >> > But slots that are already allocated will not be
> > > affected.
> > > > A
> > > > >> > > typical
> > > > >> > > > > >> > application scenario is to mitigate machine hotspots.
> > In
> > > > this
> > > > >> > > case,
> > > > >> > > > we
> > > > >> > > > > >> hope
> > > > >> > > > > >> > that subsequent resource allocations will not be on
> the
> > > hot
> > > > >> > > machine,
> > > > >> > > > > but
> > > > >> > > > > >> > tasks currently running on it should not be affected.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > 2) MARK_BLOCKLISTED_AND_EVACUATE_TASKS: Mark the
> task
> > > > >> manager
> > > > >> > or
> > > > >> > > > > node
> > > > >> > > > > >> as
> > > > >> > > > > >> > blocked, and evacuate all tasks on it. Evacuated
> tasks
> > > will
> > > > >> be
> > > > >> > > > > >> restarted on
> > > > >> > > > > >> > non-blocked task managers.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > For the above 2 actions, the former may more
> > highlight
> > > > the
> > > > >> > > meaning
> > > > >> > > > > of
> > > > >> > > > > >> > this FLIP, because the external system cannot do
> that.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Regarding *Manually* and *Automatically*, I
> basically
> > > > agree
> > > > >> > with
> > > > >> > > > > >> @Becket
> > > > >> > > > > >> > Qin: different users have different answers. Not all
> > > users’
> > > > >> > > > deployment
> > > > >> > > > > >> > environments have a special external system that can
> > > > perform
> > > > >> the
> > > > >> > > > > anomaly
> > > > >> > > > > >> > detection. In addition, adding pluggable/optional
> > > > >> auto-detection
> > > > >> > > > > doesn't
> > > > >> > > > > >> > require much extra work on top of manual
> specification.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > I will answer your other questions one by one.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > @Yangze
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > a) I think you are right, we do not need to expose
> > the
> > > > >> > > > > >> >
> > `cluster.resource-blocklist.item.timeout-check-interval`
> > > to
> > > > >> > users.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > b) We can abstract the `notifyException` to a
> > separate
> > > > >> > interface
> > > > >> > > > > >> (maybe
> > > > >> > > > > >> > BlocklistExceptionListener), and the
> > > > >> > > ResourceManagerBlocklistHandler
> > > > >> > > > > can
> > > > >> > > > > >> > implement it in the future.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > @Martijn
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > a) I also think the manual blocking should be done
> by
> > > > >> cluster
> > > > >> > > > > >> operators.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > b) I think manual blocking makes sense, because
> > > according
> > > > >> to
> > > > >> > my
> > > > >> > > > > >> > experience, users are often the first to perceive the
> > > > machine
> > > > >> > > > problems
> > > > >> > > > > >> > (because of job failover or delay), and they will
> > contact
> > > > >> > cluster
> > > > >> > > > > >> operators
> > > > >> > > > > >> > to solve it, or even tell the cluster operators which
> > > > >> machine is
> > > > >> > > > > >> > problematic. From this point of view, I think the
> > people
> > > > who
> > > > >> > > really
> > > > >> > > > > need
> > > > >> > > > > >> > the manual blocking are the users, and it’s just
> > > performed
> > > > by
> > > > >> > the
> > > > >> > > > > >> cluster
> > > > >> > > > > >> > operator, so I think the manual blocking makes sense.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > @Chesnay
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > We need to touch the logic of JM/SlotPool, because
> > for
> > > > >> > > > > >> MARK_BLOCKLISTED
> > > > >> > > > > >> > , we need to know whether the slot is blocklisted
> when
> > > the
> > > > >> task
> > > > >> > is
> > > > >> > > > > >> > FINISHED/CANCELLED/FAILED. If so,  SlotPool should
> > > release
> > > > >> the
> > > > >> > > slot
> > > > >> > > > > >> > directly to avoid assigning other tasks (of this job)
> > on
> > > > it.
> > > > >> If
> > > > >> > we
> > > > >> > > > > only
> > > > >> > > > > >> > maintain the blocklist information on the RM, JM
> needs
> > to
> > > > >> > retrieve
> > > > >> > > > it
> > > > >> > > > > by
> > > > >> > > > > >> > RPC. I think the performance overhead of that is
> > > relatively
> > > > >> > large,
> > > > >> > > > so
> > > > >> > > > > I
> > > > >> > > > > >> > think it's worth maintaining the blocklist
> information
> > on
> > > > >> the JM
> > > > >> > > > side
> > > > >> > > > > >> and
> > > > >> > > > > >> > syncing them.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > @Роман
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >     a) “Probably storing inside Zookeeper/Configmap
> > > might
> > > > >> be
> > > > >> > > > helpful
> > > > >> > > > > >> > here.”  Can you explain it in detail? I don't fully
> > > > >> understand
> > > > >> > > that.
> > > > >> > > > > In
> > > > >> > > > > >> my
> > > > >> > > > > >> > opinion, non-active and active are the same, and no
> > > special
> > > > >> > > > treatment
> > > > >> > > > > is
> > > > >> > > > > >> > required.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > b) I agree with you, the `endTimestamp` makes
> sense,
> > I
> > > > will
> > > > >> > add
> > > > >> > > it
> > > > >> > > > > to
> > > > >> > > > > >> > FLIP.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > @Yang
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > As mentioned above, AFAK, the external system
> cannot
> > > > >> support
> > > > >> > the
> > > > >> > > > > >> > MARK_BLOCKLISTED action.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Looking forward to your further feedback.
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Best,
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Lijie
> > > > >> > > > > >> > >
> > > > >> > > > > >> > >
> > > > >> > > > > >> > > Yang Wang <[email protected]> 于2022年5月3日周二
> > > 21:09写道：
> > > > >> > > > > >> > >>
> > > > >> > > > > >> > >> Thanks Lijie and Zhu for creating the proposal.
> > > > >> > > > > >> > >>
> > > > >> > > > > >> > >> I want to share some thoughts about Flink cluster
> > > > >> operations.
> > > > >> > > > > >> > >>
> > > > >> > > > > >> > >> In the production environment, the SRE(aka Site
> > > > >> Reliability
> > > > >> > > > > Engineer)
> > > > >> > > > > >> > >> already has many tools to detect the unstable
> nodes,
> > > > which
> > > > >> > > could
> > > > >> > > > > take
> > > > >> > > > > >> > the
> > > > >> > > > > >> > >> system logs/metrics into consideration.
> > > > >> > > > > >> > >> Then they use graceful-decomission in YARN and
> taint
> > > in
> > > > >> K8s
> > > > >> > to
> > > > >> > > > > >> prevent
> > > > >> > > > > >> > new
> > > > >> > > > > >> > >> allocations on these unstable nodes.
> > > > >> > > > > >> > >> At last, they will evict all the containers and
> pods
> > > > >> running
> > > > >> > on
> > > > >> > > > > these
> > > > >> > > > > >> > nodes.
> > > > >> > > > > >> > >> This mechanism also works for planned maintenance.
> > So
> > > I
> > > > am
> > > > >> > > afraid
> > > > >> > > > > >> this
> > > > >> > > > > >> > is
> > > > >> > > > > >> > >> not the typical use case for FLIP-224.
> > > > >> > > > > >> > >>
> > > > >> > > > > >> > >> If we only support to block nodes manually, then I
> > > could
> > > > >> not
> > > > >> > > see
> > > > >> > > > > >> > >> the obvious advantages compared with current SRE's
> > > > >> > approach(via
> > > > >> > > > > *yarn
> > > > >> > > > > >> > >> rmadmin or kubectl taint*).
> > > > >> > > > > >> > >> At least, we need to have a pluggable component
> > which
> > > > >> could
> > > > >> > > > expose
> > > > >> > > > > >> the
> > > > >> > > > > >> > >> potential unstable nodes automatically and block
> > them
> > > if
> > > > >> > > enabled
> > > > >> > > > > >> > explicitly.
> > > > >> > > > > >> > >>
> > > > >> > > > > >> > >>
> > > > >> > > > > >> > >> Best,
> > > > >> > > > > >> > >> Yang
> > > > >> > > > > >> > >>
> > > > >> > > > > >> > >>
> > > > >> > > > > >> > >>
> > > > >> > > > > >> > >> Becket Qin <[email protected]> 于2022年5月2日周一
> > > 16:36写道：
> > > > >> > > > > >> > >>
> > > > >> > > > > >> > >> > Thanks for the proposal, Lijie.
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > This is an interesting feature and discussion,
> and
> > > > >> somewhat
> > > > >> > > > > related
> > > > >> > > > > >> > to the
> > > > >> > > > > >> > >> > design principle about how people should operate
> > > > Flink.
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > I think there are three things involved in this
> > > FLIP.
> > > > >> > > > > >> > >> >      a) Detect and report the unstable node.
> > > > >> > > > > >> > >> >      b) Collect the information of the unstable
> > node
> > > > and
> > > > >> > > form a
> > > > >> > > > > >> > blocklist.
> > > > >> > > > > >> > >> >      c) Take the action to block nodes.
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > My two cents:
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > 1. It looks like people all agree that Flink
> > should
> > > > have
> > > > >> > c).
> > > > >> > > It
> > > > >> > > > > is
> > > > >> > > > > >> > not only
> > > > >> > > > > >> > >> > useful for cases of node failures, but also
> handy
> > > for
> > > > >> some
> > > > >> > > > > planned
> > > > >> > > > > >> > >> > maintenance.
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > 2. People have different opinions on b), i.e.
> who
> > > > >> should be
> > > > >> > > the
> > > > >> > > > > >> brain
> > > > >> > > > > >> > to
> > > > >> > > > > >> > >> > make the decision to block a node. I think this
> > > > largely
> > > > >> > > depends
> > > > >> > > > > on
> > > > >> > > > > >> > who we
> > > > >> > > > > >> > >> > talk to. Different users would probably give
> > > different
> > > > >> > > answers.
> > > > >> > > > > For
> > > > >> > > > > >> > people
> > > > >> > > > > >> > >> > who do have a centralized node health management
> > > > >> service,
> > > > >> > let
> > > > >> > > > > Flink
> > > > >> > > > > >> > do just
> > > > >> > > > > >> > >> > do a) and c) would be preferred. So essentially
> > > Flink
> > > > >> would
> > > > >> > > be
> > > > >> > > > > one
> > > > >> > > > > >> of
> > > > >> > > > > >> > the
> > > > >> > > > > >> > >> > sources that may detect unstable nodes, report
> it
> > to
> > > > >> that
> > > > >> > > > > service,
> > > > >> > > > > >> > and then
> > > > >> > > > > >> > >> > take the command from that service to block the
> > > > >> problematic
> > > > >> > > > > nodes.
> > > > >> > > > > >> On
> > > > >> > > > > >> > the
> > > > >> > > > > >> > >> > other hand, for users who do not have such a
> > > service,
> > > > >> > simply
> > > > >> > > > > >> letting
> > > > >> > > > > >> > Flink
> > > > >> > > > > >> > >> > be clever by itself to block the suspicious
> nodes
> > > > might
> > > > >> be
> > > > >> > > > > desired
> > > > >> > > > > >> to
> > > > >> > > > > >> > >> > ensure the jobs are running smoothly.
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > So that indicates a) and b) here should be
> > > pluggable /
> > > > >> > > > optional.
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > In light of this, maybe it would make sense to
> > have
> > > > >> > something
> > > > >> > > > > >> > pluggable
> > > > >> > > > > >> > >> > like a UnstableNodeReporter which exposes
> unstable
> > > > nodes
> > > > >> > > > > actively.
> > > > >> > > > > >> (A
> > > > >> > > > > >> > more
> > > > >> > > > > >> > >> > general interface should be JobInfoReporter<T>
> > which
> > > > >> can be
> > > > >> > > > used
> > > > >> > > > > to
> > > > >> > > > > >> > report
> > > > >> > > > > >> > >> > any information of type <T>. But I'll just keep
> > the
> > > > >> scope
> > > > >> > > > > relevant
> > > > >> > > > > >> to
> > > > >> > > > > >> > this
> > > > >> > > > > >> > >> > FLIP here). Personally speaking, I think it is
> OK
> > to
> > > > >> have a
> > > > >> > > > > default
> > > > >> > > > > >> > >> > implementation of a reporter which just tells
> > Flink
> > > to
> > > > >> take
> > > > >> > > > > action
> > > > >> > > > > >> to
> > > > >> > > > > >> > block
> > > > >> > > > > >> > >> > problematic nodes and also unblocks them after
> > > > timeout.
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > Thanks,
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > Jiangjie (Becket) Qin
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > On Mon, May 2, 2022 at 3:27 PM Роман Бойко <
> > > > >> > > > [email protected]
> > > > >> > > > > >
> > > > >> > > > > >> > wrote:
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> > >> > > Thanks for good initiative, Lijie and Zhu!
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> > > If it's possible I'd like to participate in
> > > > >> development.
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> > > I agree with 3rd point of Konstantin's reply -
> > we
> > > > >> should
> > > > >> > > > > consider
> > > > >> > > > > >> > to move
> > > > >> > > > > >> > >> > > somehow the information of blocklisted
> nodes/TMs
> > > > from
> > > > >> > > active
> > > > >> > > > > >> > >> > > ResourceManager to non-active ones. Probably
> > > storing
> > > > >> > inside
> > > > >> > > > > >> > >> > > Zookeeper/Configmap might be helpful here.
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> > > And I agree with Martijn that a lot of
> > > organizations
> > > > >> > don't
> > > > >> > > > want
> > > > >> > > > > >> to
> > > > >> > > > > >> > expose
> > > > >> > > > > >> > >> > > such API for a cluster user group. But I think
> > > it's
> > > > >> > > necessary
> > > > >> > > > > to
> > > > >> > > > > >> > have the
> > > > >> > > > > >> > >> > > mechanism for unblocking the nodes/TMs anyway
> > for
> > > > >> > avoiding
> > > > >> > > > > >> incorrect
> > > > >> > > > > >> > >> > > automatic behaviour.
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> > > And another one small suggestion - I think it
> > > would
> > > > be
> > > > >> > > better
> > > > >> > > > > to
> > > > >> > > > > >> > extend
> > > > >> > > > > >> > >> > the
> > > > >> > > > > >> > >> > > *BlocklistedItem* class with the
> *endTimestamp*
> > > > field
> > > > >> and
> > > > >> > > > fill
> > > > >> > > > > it
> > > > >> > > > > >> > at the
> > > > >> > > > > >> > >> > > item creation. This simple addition will allow
> > to:
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> > >    -
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> > >    Provide the ability to users to setup the
> > exact
> > > > >> time
> > > > >> > of
> > > > >> > > > > >> > blocklist end
> > > > >> > > > > >> > >> > >    through RestAPI
> > > > >> > > > > >> > >> > >    -
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> > >    Not being tied to a single value of
> > > > >> > > > > >> > >> > >    *cluster.resource-blacklist.item.timeout*
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> > > On Mon, 2 May 2022 at 14:17, Chesnay Schepler
> <
> > > > >> > > > > >> [email protected]>
> > > > >> > > > > >> > >> > wrote:
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> > > > I do share the concern between blurring the
> > > lines
> > > > a
> > > > >> > bit.
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > > > That said, I'd prefer to not have any
> > > > auto-detection
> > > > >> > and
> > > > >> > > > only
> > > > >> > > > > >> > have an
> > > > >> > > > > >> > >> > > > opt-in mechanism
> > > > >> > > > > >> > >> > > > to manually block processes/nodes. To me
> this
> > > > sounds
> > > > >> > yet
> > > > >> > > > > again
> > > > >> > > > > >> > like one
> > > > >> > > > > >> > >> > > > of those
> > > > >> > > > > >> > >> > > > magical mechanisms that will rarely work
> just
> > > > right.
> > > > >> > > > > >> > >> > > > An external system can leverage way more
> > > > information
> > > > >> > > after
> > > > >> > > > > all.
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > > > Moreover, I'm quite concerned about the
> > > complexity
> > > > >> of
> > > > >> > > this
> > > > >> > > > > >> > proposal.
> > > > >> > > > > >> > >> > > > Tracking on both the RM/JM side; syncing
> > between
> > > > >> > > > components;
> > > > >> > > > > >> > >> > adjustments
> > > > >> > > > > >> > >> > > > to the
> > > > >> > > > > >> > >> > > > slot and resource protocol.
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > > > In a way it seems overly complicated.
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > > > If we look at it purely from an active
> > resource
> > > > >> > > management
> > > > >> > > > > >> > perspective,
> > > > >> > > > > >> > >> > > > then there
> > > > >> > > > > >> > >> > > > isn't really a need to touch the slot
> protocol
> > > at
> > > > >> all
> > > > >> > (or
> > > > >> > > > in
> > > > >> > > > > >> fact
> > > > >> > > > > >> > to
> > > > >> > > > > >> > >> > > > anything in the JobMaster),
> > > > >> > > > > >> > >> > > > because there isn't any point in keeping
> > around
> > > > >> blocked
> > > > >> > > TMs
> > > > >> > > > > in
> > > > >> > > > > >> the
> > > > >> > > > > >> > >> > first
> > > > >> > > > > >> > >> > > > place.
> > > > >> > > > > >> > >> > > > They'd just be idling, potentially shutting
> > down
> > > > >> after
> > > > >> > a
> > > > >> > > > > while
> > > > >> > > > > >> by
> > > > >> > > > > >> > the
> > > > >> > > > > >> > >> > RM
> > > > >> > > > > >> > >> > > > because of
> > > > >> > > > > >> > >> > > > it (unless we _also_ touch that logic).
> > > > >> > > > > >> > >> > > > Here the blocking of a process (be it by
> > > blocking
> > > > >> the
> > > > >> > > > process
> > > > >> > > > > >> or
> > > > >> > > > > >> > node)
> > > > >> > > > > >> > >> > is
> > > > >> > > > > >> > >> > > > equivalent with shutting down the blocked
> > > > >> process(es).
> > > > >> > > > > >> > >> > > > Once the block is lifted we can just spin it
> > > back
> > > > >> up.
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > > > And I do wonder whether we couldn't apply
> the
> > > same
> > > > >> line
> > > > >> > > of
> > > > >> > > > > >> > thinking to
> > > > >> > > > > >> > >> > > > standalone resource management.
> > > > >> > > > > >> > >> > > > Here being able to stop/restart a
> process/node
> > > > >> manually
> > > > >> > > > > should
> > > > >> > > > > >> be
> > > > >> > > > > >> > a
> > > > >> > > > > >> > >> > core
> > > > >> > > > > >> > >> > > > requirement for a Flink deployment anyway.
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > > > On 02/05/2022 08:49, Martijn Visser wrote:
> > > > >> > > > > >> > >> > > > > Hi everyone,
> > > > >> > > > > >> > >> > > > >
> > > > >> > > > > >> > >> > > > > Thanks for creating this FLIP. I can
> > > understand
> > > > >> the
> > > > >> > > > problem
> > > > >> > > > > >> and
> > > > >> > > > > >> > I see
> > > > >> > > > > >> > >> > > > value
> > > > >> > > > > >> > >> > > > > in the automatic detection and
> > blocklisting. I
> > > > do
> > > > >> > have
> > > > >> > > > some
> > > > >> > > > > >> > concerns
> > > > >> > > > > >> > >> > > with
> > > > >> > > > > >> > >> > > > > the ability to manually specify to be
> > blocked
> > > > >> > > resources.
> > > > >> > > > I
> > > > >> > > > > >> have
> > > > >> > > > > >> > two
> > > > >> > > > > >> > >> > > > > concerns;
> > > > >> > > > > >> > >> > > > >
> > > > >> > > > > >> > >> > > > > * Most organizations explicitly have a
> > > > separation
> > > > >> of
> > > > >> > > > > >> concerns,
> > > > >> > > > > >> > >> > meaning
> > > > >> > > > > >> > >> > > > that
> > > > >> > > > > >> > >> > > > > there's a group who's responsible for
> > > managing a
> > > > >> > > cluster
> > > > >> > > > > and
> > > > >> > > > > >> > there's
> > > > >> > > > > >> > >> > a
> > > > >> > > > > >> > >> > > > user
> > > > >> > > > > >> > >> > > > > group who uses that cluster. With the
> > > > >> introduction of
> > > > >> > > > this
> > > > >> > > > > >> > mechanism,
> > > > >> > > > > >> > >> > > the
> > > > >> > > > > >> > >> > > > > latter group now can influence the
> > > > responsibility
> > > > >> of
> > > > >> > > the
> > > > >> > > > > >> first
> > > > >> > > > > >> > group.
> > > > >> > > > > >> > >> > > So
> > > > >> > > > > >> > >> > > > it
> > > > >> > > > > >> > >> > > > > can be possible that someone from the user
> > > group
> > > > >> > blocks
> > > > >> > > > > >> > something,
> > > > >> > > > > >> > >> > > which
> > > > >> > > > > >> > >> > > > > causes an outage (which could result in
> > paging
> > > > >> > > mechanism
> > > > >> > > > > >> > triggering
> > > > >> > > > > >> > >> > > etc)
> > > > >> > > > > >> > >> > > > > which impacts the first group.
> > > > >> > > > > >> > >> > > > > * How big is the group of people who can
> go
> > > > >> through
> > > > >> > the
> > > > >> > > > > >> process
> > > > >> > > > > >> > of
> > > > >> > > > > >> > >> > > > manually
> > > > >> > > > > >> > >> > > > > identifying a node that isn't behaving as
> it
> > > > >> should
> > > > >> > > be? I
> > > > >> > > > > do
> > > > >> > > > > >> > think
> > > > >> > > > > >> > >> > this
> > > > >> > > > > >> > >> > > > > group is relatively limited. Does it then
> > make
> > > > >> sense
> > > > >> > to
> > > > >> > > > > >> > introduce
> > > > >> > > > > >> > >> > such
> > > > >> > > > > >> > >> > > a
> > > > >> > > > > >> > >> > > > > feature, which would only be used by a
> > really
> > > > >> small
> > > > >> > > user
> > > > >> > > > > >> group
> > > > >> > > > > >> > of
> > > > >> > > > > >> > >> > > Flink?
> > > > >> > > > > >> > >> > > > We
> > > > >> > > > > >> > >> > > > > still have to maintain, test and support
> > such
> > > a
> > > > >> > > feature.
> > > > >> > > > > >> > >> > > > >
> > > > >> > > > > >> > >> > > > > I'm +1 for the autodetection features, but
> > I'm
> > > > >> > leaning
> > > > >> > > > > >> towards
> > > > >> > > > > >> > not
> > > > >> > > > > >> > >> > > > exposing
> > > > >> > > > > >> > >> > > > > this to the user group but having this
> > > available
> > > > >> > > strictly
> > > > >> > > > > for
> > > > >> > > > > >> > cluster
> > > > >> > > > > >> > >> > > > > operators. They could then also set up
> their
> > > > >> > > > > >> > paging/metrics/logging
> > > > >> > > > > >> > >> > > > system
> > > > >> > > > > >> > >> > > > > to take this into account.
> > > > >> > > > > >> > >> > > > >
> > > > >> > > > > >> > >> > > > > Best regards,
> > > > >> > > > > >> > >> > > > >
> > > > >> > > > > >> > >> > > > > Martijn Visser
> > > > >> > > > > >> > >> > > > > https://twitter.com/MartijnVisser82
> > > > >> > > > > >> > >> > > > > https://github.com/MartijnVisser
> > > > >> > > > > >> > >> > > > >
> > > > >> > > > > >> > >> > > > >
> > > > >> > > > > >> > >> > > > > On Fri, 29 Apr 2022 at 09:39, Yangze Guo <
> > > > >> > > > > [email protected]
> > > > >> > > > > >> >
> > > > >> > > > > >> > wrote:
> > > > >> > > > > >> > >> > > > >
> > > > >> > > > > >> > >> > > > >> Thanks for driving this, Zhu and Lijie.
> > > > >> > > > > >> > >> > > > >>
> > > > >> > > > > >> > >> > > > >> +1 for the overall proposal. Just share
> > some
> > > > >> cents
> > > > >> > > here:
> > > > >> > > > > >> > >> > > > >>
> > > > >> > > > > >> > >> > > > >> - Why do we need to expose
> > > > >> > > > > >> > >> > > > >>
> > > > >> > cluster.resource-blacklist.item.timeout-check-interval
> > > > >> > > > to
> > > > >> > > > > >> the
> > > > >> > > > > >> > user?
> > > > >> > > > > >> > >> > > > >> I think the semantics of
> > > > >> > > > > >> > `cluster.resource-blacklist.item.timeout`
> > > > >> > > > > >> > >> > is
> > > > >> > > > > >> > >> > > > >> sufficient for the user. How to guarantee
> > the
> > > > >> > timeout
> > > > >> > > > > >> > mechanism is
> > > > >> > > > > >> > >> > > > >> Flink's internal implementation. I think
> it
> > > > will
> > > > >> be
> > > > >> > > very
> > > > >> > > > > >> > confusing
> > > > >> > > > > >> > >> > and
> > > > >> > > > > >> > >> > > > >> we do not need to expose it to users.
> > > > >> > > > > >> > >> > > > >>
> > > > >> > > > > >> > >> > > > >> - ResourceManager can notify the
> exception
> > > of a
> > > > >> task
> > > > >> > > > > >> manager to
> > > > >> > > > > >> > >> > > > >> `BlacklistHandler` as well.
> > > > >> > > > > >> > >> > > > >> For example, the slot allocation might
> fail
> > > in
> > > > >> case
> > > > >> > > the
> > > > >> > > > > >> target
> > > > >> > > > > >> > task
> > > > >> > > > > >> > >> > > > >> manager is busy or has a network jitter.
> I
> > > > don't
> > > > >> > mean
> > > > >> > > we
> > > > >> > > > > >> need
> > > > >> > > > > >> > to
> > > > >> > > > > >> > >> > cover
> > > > >> > > > > >> > >> > > > >> this case in this version, but we can
> also
> > > > open a
> > > > >> > > > > >> > `notifyException`
> > > > >> > > > > >> > >> > in
> > > > >> > > > > >> > >> > > > >> `ResourceManagerBlacklistHandler`.
> > > > >> > > > > >> > >> > > > >>
> > > > >> > > > > >> > >> > > > >> - Before we sync the blocklist to
> > > > >> ResourceManager,
> > > > >> > > will
> > > > >> > > > > the
> > > > >> > > > > >> > slot of
> > > > >> > > > > >> > >> > a
> > > > >> > > > > >> > >> > > > >> blocked task manager continues to be
> > released
> > > > and
> > > > >> > > > > allocated?
> > > > >> > > > > >> > >> > > > >>
> > > > >> > > > > >> > >> > > > >> Best,
> > > > >> > > > > >> > >> > > > >> Yangze Guo
> > > > >> > > > > >> > >> > > > >>
> > > > >> > > > > >> > >> > > > >> On Thu, Apr 28, 2022 at 3:11 PM Lijie
> Wang
> > <
> > > > >> > > > > >> > >> > [email protected]>
> > > > >> > > > > >> > >> > > > >> wrote:
> > > > >> > > > > >> > >> > > > >>> Hi Konstantin,
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> Thanks for your feedback. I will
> response
> > > > your 4
> > > > >> > > > remarks:
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> 1) Thanks for reminding me of the
> > > > controversy. I
> > > > >> > > think
> > > > >> > > > > >> > “BlockList”
> > > > >> > > > > >> > >> > is
> > > > >> > > > > >> > >> > > > >> good
> > > > >> > > > > >> > >> > > > >>> enough, and I will change it in FLIP.
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> 2) Your suggestion for the REST API is a
> > > good
> > > > >> idea.
> > > > >> > > > Based
> > > > >> > > > > >> on
> > > > >> > > > > >> > the
> > > > >> > > > > >> > >> > > > above, I
> > > > >> > > > > >> > >> > > > >>> would change REST API as following:
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> POST/GET <host>/blocklist/nodes
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> POST/GET <host>/blocklist/taskmanagers
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> DELETE
> <host>/blocklist/node/<identifier>
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> DELETE
> > > > <host>/blocklist/taskmanager/<identifier>
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> 3) If a node is blocking/blocklisted, it
> > > means
> > > > >> that
> > > > >> > > all
> > > > >> > > > > >> task
> > > > >> > > > > >> > >> > managers
> > > > >> > > > > >> > >> > > > on
> > > > >> > > > > >> > >> > > > >>> this node are blocklisted. All slots on
> > > these
> > > > >> TMs
> > > > >> > are
> > > > >> > > > not
> > > > >> > > > > >> > >> > available.
> > > > >> > > > > >> > >> > > > This
> > > > >> > > > > >> > >> > > > >>> is actually a bit like TM losts, but
> these
> > > TMs
> > > > >> are
> > > > >> > > not
> > > > >> > > > > >> really
> > > > >> > > > > >> > lost,
> > > > >> > > > > >> > >> > > > they
> > > > >> > > > > >> > >> > > > >>> are in an unavailable status, and they
> are
> > > > still
> > > > >> > > > > registered
> > > > >> > > > > >> > in this
> > > > >> > > > > >> > >> > > > flink
> > > > >> > > > > >> > >> > > > >>> cluster. They will be available again
> once
> > > the
> > > > >> > > > > >> corresponding
> > > > >> > > > > >> > >> > > blocklist
> > > > >> > > > > >> > >> > > > >> item
> > > > >> > > > > >> > >> > > > >>> is removed. This behavior is the same in
> > > > >> > > > > active/non-active
> > > > >> > > > > >> > >> > clusters.
> > > > >> > > > > >> > >> > > > >>> However in the active clusters, these
> TMs
> > > may
> > > > be
> > > > >> > > > released
> > > > >> > > > > >> due
> > > > >> > > > > >> > to
> > > > >> > > > > >> > >> > idle
> > > > >> > > > > >> > >> > > > >>> timeouts.
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> 4) For the item timeout, I prefer to
> keep
> > > it.
> > > > >> The
> > > > >> > > > reasons
> > > > >> > > > > >> are
> > > > >> > > > > >> > as
> > > > >> > > > > >> > >> > > > >> following:
> > > > >> > > > > >> > >> > > > >>> a) The timeout will not affect users
> > adding
> > > or
> > > > >> > > removing
> > > > >> > > > > >> items
> > > > >> > > > > >> > via
> > > > >> > > > > >> > >> > > REST
> > > > >> > > > > >> > >> > > > >> API,
> > > > >> > > > > >> > >> > > > >>> and users can disable it by configuring
> it
> > > to
> > > > >> > > > > >> Long.MAX_VALUE .
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> b) Some node problems can recover after
> a
> > > > >> period of
> > > > >> > > > time
> > > > >> > > > > >> > (such as
> > > > >> > > > > >> > >> > > > machine
> > > > >> > > > > >> > >> > > > >>> hotspots), in which case users may
> prefer
> > > that
> > > > >> > Flink
> > > > >> > > > can
> > > > >> > > > > do
> > > > >> > > > > >> > this
> > > > >> > > > > >> > >> > > > >>> automatically instead of requiring the
> > user
> > > to
> > > > >> do
> > > > >> > it
> > > > >> > > > > >> manually.
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> Best,
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> Lijie
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>> Konstantin Knauf <[email protected]>
> > > > >> 于2022年4月27日周三
> > > > >> > > > > >> 19:23写道：
> > > > >> > > > > >> > >> > > > >>>
> > > > >> > > > > >> > >> > > > >>>> Hi Lijie,
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> I think, this makes sense and +1 to
> only
> > > > >> support
> > > > >> > > > > manually
> > > > >> > > > > >> > blocking
> > > > >> > > > > >> > >> > > > >>>> taskmanagers and nodes. Maybe the
> > different
> > > > >> > > strategies
> > > > >> > > > > can
> > > > >> > > > > >> > also be
> > > > >> > > > > >> > >> > > > >>>> maintained outside of Apache Flink.
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> A few remarks:
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> 1) Can we use another term than
> > > "bla.cklist"
> > > > >> due
> > > > >> > to
> > > > >> > > > the
> > > > >> > > > > >> > >> > controversy
> > > > >> > > > > >> > >> > > > >> around
> > > > >> > > > > >> > >> > > > >>>> the term? [1] There was also a Jira
> > Ticket
> > > > >> about
> > > > >> > > this
> > > > >> > > > > >> topic a
> > > > >> > > > > >> > >> > while
> > > > >> > > > > >> > >> > > > >> back
> > > > >> > > > > >> > >> > > > >>>> and there was generally a consensus to
> > > avoid
> > > > >> the
> > > > >> > > term
> > > > >> > > > > >> > blacklist &
> > > > >> > > > > >> > >> > > > >> whitelist
> > > > >> > > > > >> > >> > > > >>>> [2]? We could use "blocklist"
> "denylist"
> > or
> > > > >> > > > > "quarantined"
> > > > >> > > > > >> > >> > > > >>>> 2) For the REST API, I'd prefer a
> > slightly
> > > > >> > different
> > > > >> > > > > >> design
> > > > >> > > > > >> > as
> > > > >> > > > > >> > >> > verbs
> > > > >> > > > > >> > >> > > > >> like
> > > > >> > > > > >> > >> > > > >>>> add/remove often considered an
> > anti-pattern
> > > > for
> > > > >> > REST
> > > > >> > > > > APIs.
> > > > >> > > > > >> > POST
> > > > >> > > > > >> > >> > on a
> > > > >> > > > > >> > >> > > > >> list
> > > > >> > > > > >> > >> > > > >>>> item is generally the standard to add
> > > items.
> > > > >> > DELETE
> > > > >> > > on
> > > > >> > > > > the
> > > > >> > > > > >> > >> > > individual
> > > > >> > > > > >> > >> > > > >>>> resource is standard to remove an item.
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> POST <host>/quarantine/items
> > > > >> > > > > >> > >> > > > >>>> DELETE
> > > > <host>/quarantine/items/<itemidentifier>
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> We could also consider to separate
> > > > taskmanagers
> > > > >> > and
> > > > >> > > > > nodes
> > > > >> > > > > >> in
> > > > >> > > > > >> > the
> > > > >> > > > > >> > >> > > REST
> > > > >> > > > > >> > >> > > > >> API
> > > > >> > > > > >> > >> > > > >>>> (and internal data structures). Any
> > opinion
> > > > on
> > > > >> > this?
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> POST/GET <host>/quarantine/nodes
> > > > >> > > > > >> > >> > > > >>>> POST/GET <host>/quarantine/taskmanager
> > > > >> > > > > >> > >> > > > >>>> DELETE
> > <host>/quarantine/nodes/<identifier>
> > > > >> > > > > >> > >> > > > >>>> DELETE
> > > > >> <host>/quarantine/taskmanager/<identifier>
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> 3) How would blocking nodes behave with
> > > > >> non-active
> > > > >> > > > > >> resource
> > > > >> > > > > >> > >> > > managers,
> > > > >> > > > > >> > >> > > > >> i.e.
> > > > >> > > > > >> > >> > > > >>>> standalone or reactive mode?
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> 4) To keep the implementation even more
> > > > >> minimal,
> > > > >> > do
> > > > >> > > we
> > > > >> > > > > >> need
> > > > >> > > > > >> > the
> > > > >> > > > > >> > >> > > > timeout
> > > > >> > > > > >> > >> > > > >>>> behavior? If items are added/removed
> > > manually
> > > > >> we
> > > > >> > > could
> > > > >> > > > > >> > delegate
> > > > >> > > > > >> > >> > this
> > > > >> > > > > >> > >> > > > >> to the
> > > > >> > > > > >> > >> > > > >>>> user easily. In my opinion the timeout
> > > > behavior
> > > > >> > > would
> > > > >> > > > > >> better
> > > > >> > > > > >> > fit
> > > > >> > > > > >> > >> > > into
> > > > >> > > > > >> > >> > > > >>>> specific strategies at a later point.
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> Looking forward to your thoughts.
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> Cheers and thank you,
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> Konstantin
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> [1]
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> >
> > > > >> > > > > >>
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://en.wikipedia.org/wiki/Blacklist_(computing)#Controversy_over_use_of_the_term
> > > > >> > > > > >> > >> > > > >>>> [2]
> > > > >> > > https://issues.apache.org/jira/browse/FLINK-18209
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>> Am Mi., 27. Apr. 2022 um 04:04 Uhr
> > schrieb
> > > > >> Lijie
> > > > >> > > Wang
> > > > >> > > > <
> > > > >> > > > > >> > >> > > > >>>> [email protected]>:
> > > > >> > > > > >> > >> > > > >>>>
> > > > >> > > > > >> > >> > > > >>>>> Hi all,
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>>>> Flink job failures may happen due to
> > > cluster
> > > > >> node
> > > > >> > > > > issues
> > > > >> > > > > >> > >> > > > >> (insufficient
> > > > >> > > > > >> > >> > > > >>>> disk
> > > > >> > > > > >> > >> > > > >>>>> space, bad hardware, network
> > > abnormalities).
> > > > >> > Flink
> > > > >> > > > will
> > > > >> > > > > >> > take care
> > > > >> > > > > >> > >> > > of
> > > > >> > > > > >> > >> > > > >> the
> > > > >> > > > > >> > >> > > > >>>>> failures and redeploy the tasks.
> > However,
> > > > due
> > > > >> to
> > > > >> > > data
> > > > >> > > > > >> > locality
> > > > >> > > > > >> > >> > and
> > > > >> > > > > >> > >> > > > >>>> limited
> > > > >> > > > > >> > >> > > > >>>>> resources, the new tasks are very
> likely
> > > to
> > > > be
> > > > >> > > > > redeployed
> > > > >> > > > > >> > to the
> > > > >> > > > > >> > >> > > same
> > > > >> > > > > >> > >> > > > >>>>> nodes, which will result in continuous
> > > task
> > > > >> > > > > abnormalities
> > > > >> > > > > >> > and
> > > > >> > > > > >> > >> > > affect
> > > > >> > > > > >> > >> > > > >> job
> > > > >> > > > > >> > >> > > > >>>>> progress.
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>>>> Currently, Flink users need to
> manually
> > > > >> identify
> > > > >> > > the
> > > > >> > > > > >> > problematic
> > > > >> > > > > >> > >> > > > >> node and
> > > > >> > > > > >> > >> > > > >>>>> take it offline to solve this problem.
> > But
> > > > >> this
> > > > >> > > > > approach
> > > > >> > > > > >> has
> > > > >> > > > > >> > >> > > > >> following
> > > > >> > > > > >> > >> > > > >>>>> disadvantages:
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>>>> 1. Taking a node offline can be a
> heavy
> > > > >> process.
> > > > >> > > > Users
> > > > >> > > > > >> may
> > > > >> > > > > >> > need
> > > > >> > > > > >> > >> > to
> > > > >> > > > > >> > >> > > > >>>> contact
> > > > >> > > > > >> > >> > > > >>>>> cluster administors to do this. The
> > > > operation
> > > > >> can
> > > > >> > > > even
> > > > >> > > > > be
> > > > >> > > > > >> > >> > dangerous
> > > > >> > > > > >> > >> > > > >> and
> > > > >> > > > > >> > >> > > > >>>> not
> > > > >> > > > > >> > >> > > > >>>>> allowed during some important business
> > > > events.
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>>>> 2. Identifying and solving this kind
> of
> > > > >> problems
> > > > >> > > > > manually
> > > > >> > > > > >> > would
> > > > >> > > > > >> > >> > be
> > > > >> > > > > >> > >> > > > >> slow
> > > > >> > > > > >> > >> > > > >>>> and
> > > > >> > > > > >> > >> > > > >>>>> a waste of human resources.
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>>>> To solve this problem, Zhu Zhu and I
> > > propose
> > > > >> to
> > > > >> > > > > >> introduce a
> > > > >> > > > > >> > >> > > blacklist
> > > > >> > > > > >> > >> > > > >>>>> mechanism for Flink to filter out
> > > > problematic
> > > > >> > > > > resources.
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>>>> You can find more details in
> > FLIP-224[1].
> > > > >> Looking
> > > > >> > > > > forward
> > > > >> > > > > >> > to your
> > > > >> > > > > >> > >> > > > >>>> feedback.
> > > > >> > > > > >> > >> > > > >>>>> [1]
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> >
> > > > >> > > > > >>
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-224%3A+Blacklist+Mechanism
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>>>> Best,
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > > >>>>> Lijie
> > > > >> > > > > >> > >> > > > >>>>>
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > > >
> > > > >> > > > > >> > >> > >
> > > > >> > > > > >> > >> >
> > > > >> > > > > >> >
> > > > >> > > > > >>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Best regards,
> > > > >> > > Roman Boyko
> > > > >> > > e.: [email protected]
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> > >
> > > --
> > > https://twitter.com/snntrable
> > > https://github.com/knaufk
> > >
> >
>
>
> --
> https://twitter.com/snntrable
> https://github.com/knaufk
>

Re: [DISCUSS] FLIP-224: Blacklist Mechanism

Reply via email to