@Dong Ugh, you are right. This is indeed trickier than I imagined. You could argue that the problem here is that there is no single source of truth for the expected replication factor. While a reassignment is in progress, the reassignment itself has the expected replica set. Otherwise, it is stored partition metadata itself. This is why manually deleting the reassignment is problematic. We lose the expected replica set and we depend on users to reinstall it. I guess I'm starting to think that the way we track reassignment state in the controller is problematic. In addition to the problems caused by deletion, we cannot easily change an existing reassignment.
High level what I'm thinking is that we need to move the pending reassignment state out of the single znode and into the individual metadata of the reassigned partitions so that there is a single source of truth for the expected replicas (and hence the replication factor). This would also give us an easier mechanism to manage the batching of multiple reassignments. Let me think on it a bit and see if I can come up with a proposal. @Gwen, @Ismael That is fair. I also prefer to redefine URP if we think the compatibility impact is a lower concern than continued misuse. -Jason On Fri, Aug 3, 2018 at 12:25 PM, Ismael Juma <ism...@juma.me.uk> wrote: > RIght, that was my thinking too. > > Ismael > > On Fri, Aug 3, 2018 at 12:04 PM Gwen Shapira <g...@confluent.io> wrote: > > > On Fri, Aug 3, 2018 at 11:23 AM, Jason Gustafson <ja...@confluent.io> > > wrote: > > > > > Hey Ismael, > > > > > > Yeah, my initial inclination was to redefine URP as well. My only doubt > > was > > > how it would affect existing tools which might depend on URPs to track > > the > > > progress of a reassignment. I decided to be conservative in the end, > but > > > I'd reconsider if we think it is not a major concern. It is annoying to > > > need a new category. > > > > > > > There are existing tools that use URP to track reassignment, but there > are > > many more tools that use URP for monitoring and alerting. If I understand > > Ismael's suggestion correctly, a re-definition will improve the > reliability > > of the monitoring tools (since there won't be false alerts in case of > > re-assignment) without having to switch to a new metric. > > > > I think we should choose the proposal that improves the more common usage > > of the metric, in this case, failure monitoring rather than reassignment. > > > > > > > > > > About your question about storage in ZK, I can't think of anything > > > additional that we need. Probably the main difficulty is getting access > > to > > > the replication factor in the topic utility. My basic thought was just > to > > > collect the URPs (as we know them today) and use the config API to > > > partition them based on the replication factor. Do you see any problems > > > with this? > > > > > > -Jason > > > > > > > > > On Thu, Aug 2, 2018 at 12:14 PM, Ismael Juma <ism...@juma.me.uk> > wrote: > > > > > > > Thanks Jason. This is definitely a pain point. I actually prefer the > > > option > > > > to redefine what under-replicated means (currently under rejected > > > > alternatives). Also, do we need to make changes to what we store in > ZK? > > > If > > > > so, that should be in the KIP too. > > > > > > > > Ismael > > > > > > > > On Thu, Aug 2, 2018 at 11:45 AM Jason Gustafson <ja...@confluent.io> > > > > wrote: > > > > > > > > > Hey All, > > > > > > > > > > Another day, another KIP. This one is hopefully straightforward: > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-352% > > > > 3A+Distinguish+URPs+caused+by+reassignment > > > > > . > > > > > Have a look and let me know what you think! > > > > > > > > > > Thanks, > > > > > Jason > > > > > > > > > > > > > > > > > > > > -- > > *Gwen Shapira* > > Product Manager | Confluent > > 650.450.2760 | @gwenshap > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog > > <http://www.confluent.io/blog> > > >