Thanks, everyone: I'm starting a vote today.  Here's the recap for some of
the questions:

John: I changed the proposal to throw a non-retriable exception after the
timeout elapses. I feel it might be necessary to poison the client after
retry expires, as it might indicate a real issue.
Ismael: The proposal is to add a configuration for the retry and it will
throw a non-retriable exception after the time expires.
Chris: Addressed some unclarity that you mentioned, and a new API won't be
introduced in this KIP.  Maybe up for future discussion.
Jason: I'm proposing adding a timeout config and a bootstrap exception per
your suggestion.
Kirk: I'm proposing throwing a non-retriable exception in the network
client. See previous comment.


On Mon, Feb 27, 2023 at 9:36 AM Chris Egerton <chr...@aiven.io.invalid>
wrote:

> Hi Philip,
>
> Yeah,  "DNS resolution should occur..." seems like a better fit. 👍
>
> One other question I have is whether we should expose some kind of public
> API for performing preflight validation of the bootstrap URLs. If we change
> the behavior of a client configured with a silly typo (e.g.,
> "loclahost instead of localhost") from failing in the constructor to
> failing with a retriable exception, this might lead some client
> applications to handle that failure by, well, retrying. For reference, this
> is exactly what we do in Kafka Connect right now; see [1] and [2]. IMO it'd
> be nice to be able to opt into keeping the current behavior so that
> projects like Connect could still do preflight checks of the
> bootstrap.servers property for connectors before starting them, and report
> any issues by failing fast instead of continuously writing warning/error
> messages to their logs.
>
> I'm not sure about where this new API could go, but a few options might be:
>
> - Expose a public variant of the existing ClientUtils class
> - Add static methods to the ConsumerConfig, ProducerConfig, and
> AdminClientConfig classes
> - Add those same static methods to the KafkaConsumer, KafkaProducer, and
> KafkaAdminClient classes
>
> If this seems reasonable, we should probably also specify in the KIP that
> Kafka Connect will leverage this preflight validation logic before
> instantiating any Kafka clients for use by connectors or tasks, and
> continue to fail fast if there are typos in the bootstrap.servers property,
> or if temporary DNS resolution issues come up.
>
> [1] -
>
> https://github.com/apache/kafka/blob/5f9d01668cae64b2cacd7872d82964fa78862aaf/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L606
> [2] -
>
> https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L439
>
> Cheers,
>
> Chris
>
> On Fri, Feb 24, 2023 at 4:59 PM Philip Nee <philip...@gmail.com> wrote:
>
> > Hey Chris,
> >
> > Thanks for the quick response, and I apologize for the unclear wording
> > there, I guess "DNS lookup" would be a more appropriate wording here. So
> > what I meant there was, to delegate the DNS lookup in the constructor to
> > the network client poll, and it will happen on the very first poll.  I
> > guess the logic could look like this:
> >
> > - if the client has been bootstrapped, do nothing.
> > - Otherwise, perform DNS lookup, and acquire the bootstrap server
> address.
> >
> > Thanks for the comment there, I'll change up the wording.  Maybe revise
> it
> > as "DNS resolution should occur in the poll...." ?
> >
> > P
> >
> > On Fri, Feb 24, 2023 at 1:47 PM Chris Egerton <chr...@aiven.io.invalid>
> > wrote:
> >
> > > Hi Philip,
> > >
> > > Thanks for the KIP!
> > >
> > > QQ: In the "Proposed Changes" section, the KIP states that
> "Bootstrapping
> > > should now occur in the poll method before attempting to update the
> > > metadata. This includes resolving the addresses and bootstrapping the
> > > metadata.". By "bootstrapping the metadata" do we mean actually
> > contacting
> > > the bootstrap servers, or just setting some internal state related to
> the
> > > current set of servers that can be contacted for metadata? I ask
> because
> > it
> > > seems like the language here implies the former, but if that's the
> case,
> > > this is already happening in poll (or at least, the first invocation of
> > > it), and if it's the latter, it's probably not necessary to mention in
> > the
> > > KIP since it doesn't really impact user-facing behavior. It also seems
> > like
> > > that detail might impact how intertwined this and KIP-899 are, though
> the
> > > similarity could still be superficial either way.
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Thu, Feb 23, 2023 at 9:21 PM Philip Nee <philip...@gmail.com>
> wrote:
> > >
> > > > Hey Ismael,
> > > >
> > > > Thanks for the feedback! The proposal is not to retry automatically
> but
> > > > relies on the user polling the NetworkClient (basically,
> consumer.poll)
> > > to
> > > > reattempt the bootstrap. If bootstrapping fails, a NetworkException
> > > > (retriable) will be thrown.
> > > >
> > > > Thanks!
> > > > P
> > > >
> > > >
> > > >
> > > > On Thu, Feb 23, 2023 at 1:34 PM Ismael Juma <ism...@juma.me.uk>
> wrote:
> > > >
> > > > > Thanks for the KIP. Not sure if I missed it, but how long will we
> > retry
> > > > for
> > > > > and when do we give up and propagate the failure to the user?
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Thu, Feb 23, 2023 at 9:30 AM Philip Nee <philip...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi all!
> > > > > >
> > > > > > I want to start a discussion thread about how we can handle
> client
> > > > > > bootstrap failure due DNS lookup.  This requires a bit of
> > behavioral
> > > > > > change, so a KIP is proposed and attached to this email. Let me
> > know
> > > > what
> > > > > > you think!
> > > > > >
> > > > > >
> > > > > > *A small remark here*: *As the title of this KIP might sound
> > > > > > familiar/similar to KIP-899, it is not the same.*
> > > > > >
> > > > > > *In Summary:* I want to propose a KIP to change the existing
> > > bootstrap
> > > > > > (upon instantiation) strategy because it is reasonable to allow
> > > clients
> > > > > to
> > > > > > retry
> > > > > >
> > > > > > *KIP: *
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-909%3A+Allow+Clients+to+Rebootstrap+Upon+Failed+DNS+Resolution
> > > > > >
> > > > > > Thanks!
> > > > > > Philip
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to