Re: Improve the process of removing bookies from a cluster

2021-09-09 Thread Ivan Kelly
> Do you think it will be different enough from the > autorecovery process to put it on the bookie being drained or should it > still reside within the autorecovery process? I think there's enough commonality that a single solution can be applied to both. At root, both have to copy entries and upda

Re: Improve the process of removing bookies from a cluster

2021-09-08 Thread Michael Marshall
> One thing that running the draining on the local bookie doesn't cover, > is that, if the bookie is down and unrecoverable, the bookie will > never be drained, so the data on the bookie would remain > underreplicated. > Perhaps this is a different case, and needs to be handled differently, > but

Re: Improve the process of removing bookies from a cluster

2021-09-08 Thread Ivan Kelly
Hi Yang, > Besides the auditor, I think the external operator (whether a human > operator or an automation program) also cares about the "draining" state of > a bookie. This isn't a question of the internal model, but of how it is exposed. API-wise, it would not be a problem to expose draining as

Re: Improve the process of removing bookies from a cluster

2021-09-08 Thread Yang Yang
> > One thing that running the draining on the local bookie doesn't cover, > is that, if the bookie is down and unrecoverable, the bookie will > never be drained, so the data on the bookie would remain > underreplicated. Perhaps this is a different case, and needs to be handled differently, > but

Re: Improve the process of removing bookies from a cluster

2021-09-08 Thread Yang Yang
Hi Ivan, Thanks for the explanation! > "draining" state is not something that anyone but the auditor needs to > care about. It's not a state attribute of the bookie, > but instead an external service's opinion of what should be happening > with the bookie. as such, it doesn't belong in the booki

Re: Improve the process of removing bookies from a cluster

2021-09-08 Thread Ivan Kelly
> I am not very familiar with bookkeeper and auditor history (so please let > me know if this understanding doesn't work), but it seems to me that the > process responsible for draining the bookie could be local to the bookie > itself to limit network hops. This is a very good point. One of the re

Re: Improve the process of removing bookies from a cluster

2021-09-07 Thread Michael Marshall
Hi Ivan and Yang, ++1 I am very happy to see this initiative. It will be a fantastic improvement, and I am happy to help contribute, if help is needed. I agree that the first step is adding an endpoint to mark a bookie as read only in a persistent way, and that the "draining" state only really ne

Re: Improve the process of removing bookies from a cluster

2021-09-07 Thread Venkateswara Rao Jujjuri
Glad to see this thread. This is one of the biggest limitations to autoscaling. On Tue, Sep 7, 2021 at 6:11 AM Jonathan Ellis wrote: > On Tue, Sep 7, 2021 at 8:05 AM Ivan Kelly wrote: > > > Hi Yang, > > > > > Autoscaling is exactly one motivation for me to bring this topic up. I > > > understan

Re: Improve the process of removing bookies from a cluster

2021-09-07 Thread Jonathan Ellis
On Tue, Sep 7, 2021 at 8:05 AM Ivan Kelly wrote: > Hi Yang, > > > Autoscaling is exactly one motivation for me to bring this topic up. I > > understand that the auto-recovery is not perfect at the moment, but it's > an > > important component that maintains the core invariants of a bookkeeper > >

Re: Improve the process of removing bookies from a cluster

2021-09-07 Thread Ivan Kelly
Hi Yang, > Autoscaling is exactly one motivation for me to bring this topic up. I > understand that the auto-recovery is not perfect at the moment, but it's an > important component that maintains the core invariants of a bookkeeper > cluster, so I think we may keep improving it until we find a be

Re: Improve the process of removing bookies from a cluster

2021-09-07 Thread Yang Yang
Hi Ivan, Thanks for sharing these insights! Autoscaling is exactly one motivation for me to bring this topic up. I understand that the auto-recovery is not perfect at the moment, but it's an important component that maintains the core invariants of a bookkeeper cluster, so I think we may keep imp

Re: Improve the process of removing bookies from a cluster

2021-09-06 Thread Ivan Kelly
Hi Yang, This is something we've been thinking about internally. It's especially important if we want to implement auto scaling for bookies. I'm not sure we need a "draining" state as such. Or at least, the draining state doesn't need to be at the same level as "read-only". "draining" is only int

Improve the process of removing bookies from a cluster

2021-09-05 Thread Yang Yang
Hello everyone, I have been using the bookkeeper as part of pulsar clusters for a while and noticed that the process of decommissioning a bookie (or the recover command ) is not