Hi,

This KIP isn't implemented yet but that sounds good.

Thanks!
P

On Mon, Apr 29, 2024 at 3:17 AM Federico Valeri <fedeval...@gmail.com>
wrote:

> Hi Philip, thanks for this KIP. As other mentioned, I think it is
> useful in dynamic environment like Kubernetes.
>
> I was wondering if we should also update the "examples" modules with
> this new non-retriable exception. Wdyt?
>
> On Fri, Apr 21, 2023 at 11:13 PM Philip Nee <philip...@gmail.com> wrote:
> >
> > Hey Jason,
> >
> > Thanks again. I've updated the KIP.
> >
> > P
> >
> > On Fri, Apr 21, 2023 at 9:56 AM Jason Gustafson
> <ja...@confluent.io.invalid>
> > wrote:
> >
> > > Hey Philip, that sounds good to me.
> > >
> > > -Jason
> > >
> > >
> > >
> > > On Thu, Apr 20, 2023 at 1:20 PM Philip Nee <philip...@gmail.com>
> wrote:
> > >
> > > > Hey Jason,
> > > >
> > > > Thanks for the response. I agree with your suggestions. Just to
> clarify,
> > > we
> > > > want the timeout, bootstrap.resolve.timeout.ms to only bound the DNS
> > > > resolution time, and we can throw a fatal,
> BootstrapResolutionException
> > > (so
> > > > not connection exception anymore) afterward.
> > > >
> > > > I think that aligns with the goal of this KIP.
> > > >
> > > > P
> > > >
> > > > On Thu, Apr 20, 2023 at 9:23 AM Jason Gustafson
> > > <ja...@confluent.io.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > Hey Philip,
> > > > >
> > > > > Yeah, I see your point. Here is the challenge I'm considering.
> Today,
> > > the
> > > > > client does not forward connection-related errors to the
> application.
> > > It
> > > > is
> > > > > debatable whether it should, but that is the behavior that
> applications
> > > > > expect today. The only case of a fatal error is DNS resolution and
> the
> > > > > application must handle it because they cannot even construct the
> > > client.
> > > > > Once we have this KIP, we will introduce a separate fatal failure
> > > > mechanism
> > > > > through the `BootstrapConnectionException`. And we will use it not
> only
> > > > for
> > > > > DNS failures, but general bootstrap connection failures . The
> problem
> > > is
> > > > > that existing applications will not know that this should be
> treated
> > > as a
> > > > > fatal error. So they may continue to retry. Does it help in this
> > > > situation
> > > > > to poison the client so that it cannot be used? I'm not sure.
> Perhaps
> > > it
> > > > > would reduce the risk a bit if we change `
> > > > bootstrap.connection.timeout.ms`
> > > > > to `bootstrap.resolve.timeout.ms` or something like that? Then we
> > > retain
> > > > > the current behavior if DNS succeeds, but connections fail. I know
> I'm
> > > > the
> > > > > one that suggested generalizing the configuration, but I feel some
> > > > > hesitation after thinking about it more.
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Wed, Apr 19, 2023 at 8:10 PM Philip Nee <philip...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hey Jason,
> > > > > >
> > > > > > Thanks for your review.  I think if we make it a retriable error,
> > > does
> > > > it
> > > > > > make sense to have a configurable timeout still? as we expect the
> > > user
> > > > to
> > > > > > continue to retry anyway.
> > > > > >
> > > > > > I'm considering the case of bad configuration. If the user
> retries
> > > the
> > > > > > error, then we rely on the error/warning to alert the user.  In
> this
> > > > > case,
> > > > > > maybe we continue using the proposed behavior, i.e. warn on each
> poll
> > > > > after
> > > > > > the timeout period.
> > > > > >
> > > > > > If you agree that a configuration is needed, maybe we can call
> this
> > > > > > *bootstrap.auto.retry.ms
> > > > > > <http://bootstrap.auto.retry.ms> *instead, to indicate a
> > > configurable
> > > > > > period of automatic retry. What do you think?
> > > > > >
> > > > > > Cheers,
> > > > > > P
> > > > > >
> > > > > > On Wed, Apr 19, 2023 at 7:17 PM Jason Gustafson
> > > > > <ja...@confluent.io.invalid
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Phillip,
> > > > > > >
> > > > > > > The KIP looks good. 5 minutes seems like a reasonable
> tradeoff. I
> > > do
> > > > > > wonder
> > > > > > > if it is necessary to treat bootstrap timeout as a fatal error
> > > > though.
> > > > > It
> > > > > > > seems possible that the exception might be caught by handlers
> in
> > > > > existing
> > > > > > > applications which may not expect that the client needs to be
> > > > > restarted.
> > > > > > > Perhaps it would be safer to make it retriable? As long as the
> > > > > > application
> > > > > > > continues trying to use the client, we could continue trying to
> > > reach
> > > > > the
> > > > > > > bootstrap servers perhaps? That would be closer to behavior
> today
> > > > which
> > > > > > > only treats DNS resolution failures as fatal. What do you
> think?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Mon, Apr 10, 2023 at 1:53 PM Philip Nee <
> philip...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks, everyone: I'm starting a vote today.  Here's the
> recap
> > > for
> > > > > some
> > > > > > > of
> > > > > > > > the questions:
> > > > > > > >
> > > > > > > > John: I changed the proposal to throw a non-retriable
> exception
> > > > after
> > > > > > the
> > > > > > > > timeout elapses. I feel it might be necessary to poison the
> > > client
> > > > > > after
> > > > > > > > retry expires, as it might indicate a real issue.
> > > > > > > > Ismael: The proposal is to add a configuration for the retry
> and
> > > it
> > > > > > will
> > > > > > > > throw a non-retriable exception after the time expires.
> > > > > > > > Chris: Addressed some unclarity that you mentioned, and a
> new API
> > > > > won't
> > > > > > > be
> > > > > > > > introduced in this KIP.  Maybe up for future discussion.
> > > > > > > > Jason: I'm proposing adding a timeout config and a bootstrap
> > > > > exception
> > > > > > > per
> > > > > > > > your suggestion.
> > > > > > > > Kirk: I'm proposing throwing a non-retriable exception in the
> > > > network
> > > > > > > > client. See previous comment.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Feb 27, 2023 at 9:36 AM Chris Egerton
> > > > > <chr...@aiven.io.invalid
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Philip,
> > > > > > > > >
> > > > > > > > > Yeah,  "DNS resolution should occur..." seems like a better
> > > fit.
> > > > 👍
> > > > > > > > >
> > > > > > > > > One other question I have is whether we should expose some
> kind
> > > > of
> > > > > > > public
> > > > > > > > > API for performing preflight validation of the bootstrap
> URLs.
> > > If
> > > > > we
> > > > > > > > change
> > > > > > > > > the behavior of a client configured with a silly typo
> (e.g.,
> > > > > > > > > "loclahost instead of localhost") from failing in the
> > > constructor
> > > > > to
> > > > > > > > > failing with a retriable exception, this might lead some
> client
> > > > > > > > > applications to handle that failure by, well, retrying. For
> > > > > > reference,
> > > > > > > > this
> > > > > > > > > is exactly what we do in Kafka Connect right now; see [1]
> and
> > > > [2].
> > > > > > IMO
> > > > > > > > it'd
> > > > > > > > > be nice to be able to opt into keeping the current
> behavior so
> > > > that
> > > > > > > > > projects like Connect could still do preflight checks of
> the
> > > > > > > > > bootstrap.servers property for connectors before starting
> them,
> > > > and
> > > > > > > > report
> > > > > > > > > any issues by failing fast instead of continuously writing
> > > > > > > warning/error
> > > > > > > > > messages to their logs.
> > > > > > > > >
> > > > > > > > > I'm not sure about where this new API could go, but a few
> > > options
> > > > > > might
> > > > > > > > be:
> > > > > > > > >
> > > > > > > > > - Expose a public variant of the existing ClientUtils class
> > > > > > > > > - Add static methods to the ConsumerConfig,
> ProducerConfig, and
> > > > > > > > > AdminClientConfig classes
> > > > > > > > > - Add those same static methods to the KafkaConsumer,
> > > > > KafkaProducer,
> > > > > > > and
> > > > > > > > > KafkaAdminClient classes
> > > > > > > > >
> > > > > > > > > If this seems reasonable, we should probably also specify
> in
> > > the
> > > > > KIP
> > > > > > > that
> > > > > > > > > Kafka Connect will leverage this preflight validation logic
> > > > before
> > > > > > > > > instantiating any Kafka clients for use by connectors or
> tasks,
> > > > and
> > > > > > > > > continue to fail fast if there are typos in the
> > > bootstrap.servers
> > > > > > > > property,
> > > > > > > > > or if temporary DNS resolution issues come up.
> > > > > > > > >
> > > > > > > > > [1] -
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/kafka/blob/5f9d01668cae64b2cacd7872d82964fa78862aaf/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L606
> > > > > > > > > [2] -
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L439
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > >
> > > > > > > > > Chris
> > > > > > > > >
> > > > > > > > > On Fri, Feb 24, 2023 at 4:59 PM Philip Nee <
> > > philip...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Chris,
> > > > > > > > > >
> > > > > > > > > > Thanks for the quick response, and I apologize for the
> > > unclear
> > > > > > > wording
> > > > > > > > > > there, I guess "DNS lookup" would be a more appropriate
> > > wording
> > > > > > here.
> > > > > > > > So
> > > > > > > > > > what I meant there was, to delegate the DNS lookup in the
> > > > > > constructor
> > > > > > > > to
> > > > > > > > > > the network client poll, and it will happen on the very
> first
> > > > > poll.
> > > > > > > I
> > > > > > > > > > guess the logic could look like this:
> > > > > > > > > >
> > > > > > > > > > - if the client has been bootstrapped, do nothing.
> > > > > > > > > > - Otherwise, perform DNS lookup, and acquire the
> bootstrap
> > > > server
> > > > > > > > > address.
> > > > > > > > > >
> > > > > > > > > > Thanks for the comment there, I'll change up the wording.
> > > > Maybe
> > > > > > > revise
> > > > > > > > > it
> > > > > > > > > > as "DNS resolution should occur in the poll...." ?
> > > > > > > > > >
> > > > > > > > > > P
> > > > > > > > > >
> > > > > > > > > > On Fri, Feb 24, 2023 at 1:47 PM Chris Egerton
> > > > > > > <chr...@aiven.io.invalid
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Philip,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the KIP!
> > > > > > > > > > >
> > > > > > > > > > > QQ: In the "Proposed Changes" section, the KIP states
> that
> > > > > > > > > "Bootstrapping
> > > > > > > > > > > should now occur in the poll method before attempting
> to
> > > > update
> > > > > > the
> > > > > > > > > > > metadata. This includes resolving the addresses and
> > > > > bootstrapping
> > > > > > > the
> > > > > > > > > > > metadata.". By "bootstrapping the metadata" do we mean
> > > > actually
> > > > > > > > > > contacting
> > > > > > > > > > > the bootstrap servers, or just setting some internal
> state
> > > > > > related
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > current set of servers that can be contacted for
> metadata?
> > > I
> > > > > ask
> > > > > > > > > because
> > > > > > > > > > it
> > > > > > > > > > > seems like the language here implies the former, but if
> > > > that's
> > > > > > the
> > > > > > > > > case,
> > > > > > > > > > > this is already happening in poll (or at least, the
> first
> > > > > > > invocation
> > > > > > > > of
> > > > > > > > > > > it), and if it's the latter, it's probably not
> necessary to
> > > > > > mention
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > KIP since it doesn't really impact user-facing
> behavior. It
> > > > > also
> > > > > > > > seems
> > > > > > > > > > like
> > > > > > > > > > > that detail might impact how intertwined this and
> KIP-899
> > > > are,
> > > > > > > though
> > > > > > > > > the
> > > > > > > > > > > similarity could still be superficial either way.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > >
> > > > > > > > > > > Chris
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Feb 23, 2023 at 9:21 PM Philip Nee <
> > > > > philip...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hey Ismael,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the feedback! The proposal is not to retry
> > > > > > > automatically
> > > > > > > > > but
> > > > > > > > > > > > relies on the user polling the NetworkClient
> (basically,
> > > > > > > > > consumer.poll)
> > > > > > > > > > > to
> > > > > > > > > > > > reattempt the bootstrap. If bootstrapping fails, a
> > > > > > > NetworkException
> > > > > > > > > > > > (retriable) will be thrown.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks!
> > > > > > > > > > > > P
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Feb 23, 2023 at 1:34 PM Ismael Juma <
> > > > > ism...@juma.me.uk
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the KIP. Not sure if I missed it, but
> how
> > > long
> > > > > > will
> > > > > > > we
> > > > > > > > > > retry
> > > > > > > > > > > > for
> > > > > > > > > > > > > and when do we give up and propagate the failure
> to the
> > > > > user?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ismael
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Feb 23, 2023 at 9:30 AM Philip Nee <
> > > > > > > philip...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi all!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I want to start a discussion thread about how we
> can
> > > > > handle
> > > > > > > > > client
> > > > > > > > > > > > > > bootstrap failure due DNS lookup.  This requires
> a
> > > bit
> > > > of
> > > > > > > > > > behavioral
> > > > > > > > > > > > > > change, so a KIP is proposed and attached to this
> > > > email.
> > > > > > Let
> > > > > > > me
> > > > > > > > > > know
> > > > > > > > > > > > what
> > > > > > > > > > > > > > you think!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > *A small remark here*: *As the title of this KIP
> > > might
> > > > > > sound
> > > > > > > > > > > > > > familiar/similar to KIP-899, it is not the same.*
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > *In Summary:* I want to propose a KIP to change
> the
> > > > > > existing
> > > > > > > > > > > bootstrap
> > > > > > > > > > > > > > (upon instantiation) strategy because it is
> > > reasonable
> > > > to
> > > > > > > allow
> > > > > > > > > > > clients
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > retry
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > *KIP: *
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-909%3A+Allow+Clients+to+Rebootstrap+Upon+Failed+DNS+Resolution
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > > > Philip
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Reply via email to