Given that the conversation has lingered for a bit, I've gone ahead and opened up a PR with the initial implementation. Let me know your thoughts!
https://github.com/apache/kafka/pull/5002 Also, voting is open - so if you like this idea please send me some binding +1's before May 22nd so we can get this in Kafka 2.0 :) On Tue, Apr 17, 2018 at 7:11 PM, Matt Farmer <m...@frmr.me> wrote: > Hello all, I've updated a KIP again to add a few sentences about the > general problem we were facing in the motivation section. Please let me > know if there is any further feedback. > > On Tue, Apr 3, 2018 at 1:46 PM, Matt Farmer <m...@frmr.me> wrote: > >> Hey Randall, >> >> Devil's advocate sparring is always a fun game so I'm down. =) >> >> Rebalance caused by connectivity failure is the case that caused us to >> notice the issue. But the issue >> is really more about giving connectors the tools to facilitate rebalances >> or a Kafka connect shutdown >> cleanly. Perhaps that wasn't clear in the KIP. >> >> In our case timeouts were *not* uniformly affecting tasks. But every >> time a timeout occurred in one task, >> all tasks lost whatever forward progress they had made. So, yes, in the >> specific case of timeouts a >> backoff jitter in the connector is a solution for that particular issue. >> However, this KIP *also* gives connectors >> enough information to behave intelligently during any kind of rebalance >> or shutdown because they're able >> to discover that preCommit is being invoked for that specific reason. (As >> opposed to being invoked >> during normal operation.) >> >> On Tue, Apr 3, 2018 at 12:36 PM, Randall Hauch <rha...@gmail.com> wrote: >> >>> Matt, >>> >>> Let me play devil's advocate. Do we need this additional complexity? The >>> motivation section talked about needing to deal with task failures due to >>> connectivity problems. Generally speaking, isn't it possible that if one >>> task has connectivity problems with either the broker or the external >>> system that other tasks would as well? And in the particular case of S3, >>> is >>> it possible to try and prevent the task shutdown in the first place, >>> perhaps by improving how the S3 connector retries? (We did this in the >>> Elasticsearch connector using backoff with jitter; see >>> https://github.com/confluentinc/kafka-connect-elasticsearch/pull/116 for >>> details.) >>> >>> Best regards, >>> >>> Randall >>> >>> On Tue, Apr 3, 2018 at 8:39 AM, Matt Farmer <m...@frmr.me> wrote: >>> >>> > I have made the requested updates to the KIP! :) >>> > >>> > On Mon, Apr 2, 2018 at 11:02 AM, Matt Farmer <m...@frmr.me> wrote: >>> > >>> > > Ugh >>> > > >>> > > * I can update >>> > > >>> > > I need more caffeine... >>> > > >>> > > On Mon, Apr 2, 2018 at 11:01 AM, Matt Farmer <m...@frmr.me> wrote: >>> > > >>> > >> I'm can update the rejected alternatives section as you describe. >>> > >> >>> > >> Also, adding a paragraph to the preCommit javadoc also seems like a >>> > >> Very Very Good Idea™ so I'll make that update to the KIP as well. >>> > >> >>> > >> On Mon, Apr 2, 2018 at 10:48 AM, Randall Hauch <rha...@gmail.com> >>> > wrote: >>> > >> >>> > >>> Thanks for the KIP proposal, Matt. >>> > >>> >>> > >>> You mention in the "Rejected Alternatives" section that you >>> considered >>> > >>> changing the signature of the `preCommit` method but rejected it >>> > because >>> > >>> it >>> > >>> would break backward compatibility. Strictly speaking, it is >>> possible >>> > to >>> > >>> do >>> > >>> this without breaking compatibility by introducing a new >>> `preCommit` >>> > >>> method, deprecating the old one, and having the new implementation >>> call >>> > >>> the >>> > >>> old one. Such an approach would be complicated, and I'm not sure it >>> > adds >>> > >>> any value. In fact, one of the benefits of having a context object >>> is >>> > >>> that >>> > >>> we can add methods like the one you're proposing without causing >>> any >>> > >>> compatibility issues. Anyway, it probably is worth updating this >>> > rejected >>> > >>> alternative to be a bit more precise. >>> > >>> >>> > >>> Otherwise, I think this is a good approach, though I'd request >>> that we >>> > >>> update the `preCommit` JavaDoc to add a paragraph that explains >>> this >>> > >>> scenario. Thoughts? >>> > >>> >>> > >>> Randall >>> > >>> >>> > >>> On Wed, Mar 28, 2018 at 9:29 PM, Ted Yu <yuzhih...@gmail.com> >>> wrote: >>> > >>> >>> > >>> > I looked at WorkerSinkTask and it seems using a boolean for >>> KIP-275 >>> > >>> should >>> > >>> > suffice for now. >>> > >>> > >>> > >>> > Thanks >>> > >>> > >>> > >>> > On Wed, Mar 28, 2018 at 7:20 PM, Matt Farmer <m...@frmr.me> >>> wrote: >>> > >>> > >>> > >>> > > Hey Ted, >>> > >>> > > >>> > >>> > > I have not, actually! >>> > >>> > > >>> > >>> > > Do you think that we're likely to add multiple states here >>> soon? >>> > >>> > > >>> > >>> > > My instinct is to keep it simple until there are multiple >>> states >>> > >>> that we >>> > >>> > > would want >>> > >>> > > to consider. I really like the simplicity of just getting a >>> boolean >>> > >>> and >>> > >>> > the >>> > >>> > > implementation of WorkerSinkTask already passes around a >>> boolean to >>> > >>> > > indicate this is happening internally. We're really just >>> shuttling >>> > >>> that >>> > >>> > > value into >>> > >>> > > the context at the correct moments. >>> > >>> > > >>> > >>> > > Once we have multiple states, we could choose to provide a more >>> > >>> > > appropriately >>> > >>> > > named method (e.g. getState?) and reimplement isClosing by >>> checking >>> > >>> that >>> > >>> > > enum >>> > >>> > > without breaking compatibility. >>> > >>> > > >>> > >>> > > However, if we think multiple states here are imminent for some >>> > >>> reason, I >>> > >>> > > would >>> > >>> > > be pretty easy to convince adding that would be worth the extra >>> > >>> > complexity! >>> > >>> > > :) >>> > >>> > > >>> > >>> > > Matt >>> > >>> > > >>> > >>> > > — >>> > >>> > > Matt Farmer | Blog <http://farmdawgnation.com/> | Twitter >>> > >>> > > <http://twitter.com/farmdawgnation> >>> > >>> > > GPG: CD57 2E26 F60C 0A61 E6D8 FC72 4493 8917 D667 4D07 >>> > >>> > > >>> > >>> > > On Wed, Mar 28, 2018 at 10:02 PM, Ted Yu <yuzhih...@gmail.com> >>> > >>> wrote: >>> > >>> > > >>> > >>> > > > The enhancement gives SinkTaskContext state information. >>> > >>> > > > >>> > >>> > > > Have you thought of exposing the state retrieval as an enum >>> > >>> (initially >>> > >>> > > with >>> > >>> > > > two values) ? >>> > >>> > > > >>> > >>> > > > Thanks >>> > >>> > > > >>> > >>> > > > On Wed, Mar 28, 2018 at 6:55 PM, Matt Farmer <m...@frmr.me> >>> > wrote: >>> > >>> > > > >>> > >>> > > > > Hello all, >>> > >>> > > > > >>> > >>> > > > > I am proposing KIP-275 to improve Connect's >>> SinkTaskContext so >>> > >>> that >>> > >>> > > Sinks >>> > >>> > > > > can be informed >>> > >>> > > > > in their preCommit hook if the hook is being invoked as a >>> part >>> > >>> of a >>> > >>> > > > > rebalance or Connect >>> > >>> > > > > shutdown. >>> > >>> > > > > >>> > >>> > > > > The KIP is here: >>> > >>> > > > > https://cwiki.apache.org/confluence/pages/viewpage. >>> > >>> > > > action?pageId=75977607 >>> > >>> > > > > >>> > >>> > > > > Please let me know what feedback y'all have. Thanks! >>> > >>> > > > > >>> > >>> > > > >>> > >>> > > >>> > >>> > >>> > >>> >>> > >> >>> > >> >>> > > >>> > >>> >> >> >