Thanks, Konstantine. I've incorporated your suggestion about the `tick()` mention, and it sounds like my recent changes satisfy all of your other concerns (which seem relatively minor). So I'll go ahead and start a vote and we can continue to fine tune the wording.
Best regards, Randall On Mon, Jun 7, 2021 at 2:31 PM Konstantine Karantasis <konstant...@confluent.io.invalid> wrote: > Thanks Randall. I reply inline as well. > > > On Sat, Jun 5, 2021 at 1:57 PM Randall Hauch <rha...@apache.org> wrote: > > > Thanks for the review and feedback, Konstantine. I think I've addressed > all > > of your concerns below. > > > > On Fri, Jun 4, 2021 at 7:55 PM Konstantine Karantasis > > <konstant...@confluent.io.invalid> wrote: > > > > > Thanks for the KIP Randall! > > > Really happy to see this feature coming along finally. Will indeed > > resolve > > > an unintuitive aspect of Connect's REST API. > > > > > > I'm in favor of the current approach overall. Just a few comments below > > > that could use a brief discussion: > > > > > > 1) Is there a specific reason why you choose the config topic to > persist > > > the restart requests? > > > I know it doesn't really matter which internal topic is used for > > backwards > > > compatibility, because the worker ignores records that does not > > understand. > > > > > > > I think there's value in the restart request record being ordered along > > with the other config topic records, including configuration records, > > target state change records, and session key updates. This also aligns > with > > how the distributed herder will process the restart requests within its > > `tick()` method, again making sure there are no ongoing starts and stops > > from rebalances or new connectors while the restarts are taking place. > Plus > > there is already a built-in config topic listener mechanism in the > > distributed herder. > > > > I think those are good reasons to use the config topic. But there also is > > not a viable alternative: the status topic has no existing listener > > mechanism in the StatusBackingStore, and the offset topic is clearly not > > applicable. > > > > These are good points that in practice make the config topic preferrable. > Even though some sound more like implementation details, I think saying > explicitly why the config topic gives us a good tradeoff (you mostly say > it) and why the status topic is not preferred (some info is missing here, > especially regarding the listener) is valuable. > I've tried to mention why the config topic is applicable here in the final paragraph of the "Distributed Runtime" section. And I previously added a third Rejected Alternative that mentions the cons of the status topic and explicitly mentions "the StatusBackingStore interface does not define a listener mechanism". Let me know if you still are looking for other clarification. > Also a bit of a nit but mentioning the "tick()" method might not be clear > for the uninitiated with the internals of the Connect framework. > Calling this the main thread loop of the workers might be simpler. > Good idea. I've updated the KIP. > > > > But are there any implications upon worker restart? I read you > > specifically > > > mention that a worker will read restart requests upon its own restart. > > But > > > is this really desirable? > > > > > > > It will read them, just like all of the config records, target state > change > > records, and session key updates. However, upon worker restart the herder > > will ignore any restart requests, just like it ignores session key update > > and task state change records. > > > > > > > The config topic offers a way to persist configurations of the latest > > > running connectors and upon cluster restart to bring these connectors > > back > > > up. > > > However, I'm not sure we want to persist recent restart attempts, the > > same > > > way we persist configurations. Additionally configurations might get > > > additional backups in order to allow for the config topic to be > recovered > > > for disaster recovery reasons. To that respect, I'm not sure that the > > > restart requests require the same guarantees. > > > > > > > This is not the first or only non-config record written to this topic, so > > there's already precedent. > > > > I acknowledge the precedent but we might need to draw a line at some point > and make sure we maintain more permanent information in the config topic > and more transient in the status topic, modulo any other concerns or > trade-offs. > As discussed above, I agree that the config topic is a better option here. > > > > > > > Given the above, would the status topic offer a better alternative in > > order > > > to broadcast the restart requests across the workers? Haven't looked > > > closely what would be the implications of writing these new records to > > the > > > status topic, but this topic tends to be more transient. > > > Or is it that you've preferred the config topic for the workers to be > > aware > > > of the interleaving between restart requests and requests for > > > reconfigurations? If that's the case an explicit call out might be > > > worthwhile in the KIP itself. > > > > > > > I guess I thought I already tried to do that. But I can add a bit more > > detail to the KIP to call this out even more. > > > > +1. Indeed there's some explanation on that already. Maybe a subsection > header and a juxtaposition with the status topic could help to highlight > the reason behind this selection in a more clear way. > I feel like the third Rejected Alternative (which I added after your first email review) is a better place for why we chose not to do something. Apologies for not mentioning that new Rejected Alternative in my previous email. > > > > > > > > > > > > 2) You mention in the KIP the status STOPPED. But this is not currently > > > part of the task or connector states that get persisted to the status > > > topic, as these are defined in AbstractStatus#State enum. Should we > > remove > > > any reference to a STOPPED state to avoid confusion? The only new state > > > seems to be the RESTARTING state. > > > > > > > Good catch. Removed references of the STOPPED state. > > > > And a few minor comments. > > > a) The sentence "We need to keep the existing behavior of the method to > > > just restart the Connector object must be retained for backward > > > compatibility," doesn't read well for me. > > > > > > > Reworded. > > > > > > > b) Subsection title: "Use REST API for Worker-to-Worker Communication > > and" > > > is possibly an unfinished sentence. > > > > > > > Ah, that was just a typo. Fixed. > > > > > > > c) In the paragraph right below, the last sentence "especially when the > > > number of workers " might also be missing something. > > > > > > > Fixed. > > > > > Thanks for the replies. > +1 overall. > > Konstantine > > > > > > > > > > > Best, > > > Konstantine > > > > > > On Thu, Jun 3, 2021 at 8:42 AM Kalpesh Patel > <kpa...@confluent.io.invalid > > > > > > wrote: > > > > > > > Agreed, this would be a great enhancement. > > > > > > > > On Thu, Jun 3, 2021 at 9:26 AM Nikita Kretov <kretov...@gmail.com> > > > wrote: > > > > > > > > > Hello! It's really not intuitive that you need to restart connect > and > > > > > tasks separately! I'd like to see this KIP landed in 3.0.0 release! > > > > > > > > > > On 6/2/21 7:40 PM, Randall Hauch wrote: > > > > > > Hello all, > > > > > > > > > > > > Many users struggle with the connector restart REST API only > > > restarting > > > > > the > > > > > > Connector instance rather than everything they associated with a > > > > "named" > > > > > > connector. I've created a KIP that attempts to solve this problem > > via > > > > new > > > > > > query parameters on an existing REST API method, though by > default > > it > > > > > > remains backward compatible with older behavior: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks > > > > > > > > > > > > Please take a look and respond here with questions. I'd love to > get > > > > this > > > > > > into AK 3.0.0, and the KIP freeze is coming up soon. > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Randall > > > > > > > > > > > > > > > > > > > > >