Ivan, thanks for reviewing my proposal. I will answer your questions inline.
When a leader fails, does the new leader automatically create a new > assignment, or does it continue with the assignment from the previous > leader? > The new leader will resume scheduling duties with the current set of assignments. All assignments are stored in the "assignment" topic thus all workers should have the latest view of the assignments given they have caught up reading the topic. > Is the drain request a new concept in the model? Yes > I would suggest it > would be better for the drain command to mark the worker as > unschedulable (persisted). Then the check for whether draining is > complete is whether the worker is doing any work (i.e. whether it has > seen and processed the schedule that it's no longer a part of). This > way there's no "drain" request to track as such. There's marking the > worker as unschedulable, which is idempotent. > Marking a worker as unschedulable in a persisted manner is non trivial. We would have to introduce a new internal topic to track this or leverage something like ZK. I didn't think it was worthwhile for us to add these new resources / potential dependencies for this feature. That is why I proposed that leaders track them just in memory and if a leader were to fail during a drain, a client can just simply restart the process. The leader should work in a declarative rather than imperative > fashion. i.e. it should generate the desired schedule, and the workers > should work to match this schedule. This should avoid the leader > failing issue. > Pulsar Functions already does this. The leader will always be the only entity to create new assignments and will be the only writer to the "assignments" topic. All assignments are persisted in the "assignment" topics and are unaffected by failed leaders. The next leader will pick up from where the previous leader left off. However, we currently do not have a way to attach persistent state/metadata changes for workers such as marking them as in "draining" Best, Jerry On Wed, Sep 1, 2021 at 12:25 AM Ivan Kelly <iv...@apache.org> wrote: > > When the leader receives a request to drain a worker, it must first mark > > the worker as in the process to be drained i.e. blacklist the worker so > > that no new assignments can be assigned to it. We can perhaps just save > the > > blacklist in memory. The worker should then create a new scheduling in > > which the assignments of the worker to be drained are moved to other > > workers perhaps in a round robin distribution. Afterwards, the leader > > should mark the drain of the worker to be complete. > > > > There are some caveats to this approach. If the leader fails before > > completing the drain request. The drain request will not be fulfilled. > > However, if the client frequently checks the status of the drain, it > should > > notice that the drain is not running and can re-submit a request. > > A couple of questions/comments. > > When a leader fails, does the new leader automatically create a new > assignment, or does it continue with the assignment from the previous > leader? > > Is the drain request a new concept in the model? I would suggest it > would be better for the drain command to mark the worker as > unschedulable (persisted). Then the check for whether draining is > complete is whether the worker is doing any work (i.e. whether it has > seen and processed the schedule that it's no longer a part of). This > way there's no "drain" request to track as such. There's marking the > worker as unschedulable, which is idempotent. > > The leader should work in a declarative rather than imperative > fashion. i.e. it should generate the desired schedule, and the workers > should work to match this schedule. This should avoid the leader > failing issue. > > -Ivan >