Sounds good, I agree that should not make a big difference in practice.

On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne <bby...@confluent.io> wrote:

> Hi Guozhang,
>
> I see, we agree on the topic threshold not applying to urgent topics, but
> differ slightly on what should be considered urgent. I would argue that we
> should consider topics nearing the metadata.max.age.ms to be urgent since
> they may still be well within the metadata.expiry.ms. That is, the client
> still considers these topics to be relevant (not expired), but doesn't want
> to incur the latency bubble of having to wait for the metadata to be
> re-fetched if it's stale. This could be a frequent case if the
> metadata.max.age.ms << metadata.expiry.ms.
>
> In practice, I wouldn't expect this to make a noticeable difference so I
> don't have a strong leaning, but the current behavior today is to
> aggressively refresh the metadata of stale topics by ensuring a refresh is
> triggered before that metadata.max.age.ms duration elapses.
>
> Thanks,
> Brian
>
>
> On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hello Brian,
> >
> > Thanks for your explanation, could you then update the wiki page for the
> > algorithm part since when I read it, I thought it was different from the
> > above, e.g. urgent topics should not be added just because of max.age
> > expiration, but should only be added if there are sending data pending.
> >
> >
> > Guozhang
> >
> > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <bby...@confluent.io> wrote:
> >
> > > Hi Guozhang,
> > >
> > > Thanks for the feedback!
> > >
> > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <wangg...@gmail.com>
> wrote:
> > >
> > > > 1. The addition of *metadata.expiry.ms <http://metadata.expiry.ms>
> > > *should
> > > > be included in the public interface. Also its semantics needs more
> > > > clarification (since previously it is hard-coded internally we do not
> > > need
> > > > to explain it publicly, but now with the configurable value we do
> > need).
> > > >
> > >
> > > This was an oversight. Done.
> > >
> > >
> > > > 2. There are a couple of hard-coded parameters like 25 and 0.5 in the
> > > > proposal, maybe we need to explain why these magic values makes sense
> > in
> > > > common scenarios.
> > > >
> > >
> > > So these are pretty fuzzy numbers, and seemed to be a decent balance
> > > between trade-offs. I've updated the target size to account for setups
> > with
> > > a large number of topics or a shorter refresh time, as well as added
> some
> > > light rationale.
> > >
> > >
> > > > 3. In the Urgent set condition, do you actually mean "with no cached
> > > > metadata AND there are existing data buffered for the topic"?
> > > >
> > >
> > > Yes, fixed.
> > >
> > >
> > >
> > > > One concern I have is whether or not we may introduce a regression,
> > > > especially during producer startup such that since we only require up
> > to
> > > 25
> > > > topics each request, it may cause the send data to be buffered more
> > time
> > > > than now due to metadata not available. I understand this is a
> > > acknowledged
> > > > trade-off in our design but any regression that may surface to users
> > need
> > > > to be very carefully considered. I'm wondering, e.g. if we can tweak
> > our
> > > > algorithm for the Urgent set, e.g. to consider those with non cached
> > > > metadata have higher priority than those who have elapsed max.age but
> > not
> > > > yet have been called for sending. More specifically:
> > > >
> > > > Urgent: topics that have been requested for sending but no cached
> > > metadata,
> > > > and topics that have send request failed with e.g. NOT_LEADER.
> > > > Non-urgent: topics that are not in Urgent but have expired max.age.
> > > >
> > > > Then when sending metadata, we always send ALL in the urgent (i.e.
> > ignore
> > > > the size limit), and only when they do not exceed the size limit,
> > > consider
> > > > fill in more topics from Non-urgent up to the size limit.
> > > >
> > >
> > > I think we're on the same page here. Urgent topics ignore the target
> > > metadata RPC size, and are not bound by it in any way, i.e. if there's
> > 100
> > > urgent topics, we'll fetch all 100 in a single RPC. Like you say,
> > however,
> > > if a topic transitions to urgent and there's several non-urgent ones,
> > we'll
> > > piggyback the non-urgent updates up to the target size.
> > >
> > > Thanks,
> > >  Brian
> > >
> > >
> > > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <dengziming1...@gmail.com>
> > > > wrote:
> > > >
> > > > > I think it's ok, and you can add another issue about `asynchronous
> > > > > metadata` if `topic expiry` is not enough.
> > > > >
> > > > >
> > > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <bby...@confluent.io>
> > > wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > I've refactored the KIP to remove implementing asynchronous
> > metadata
> > > > > > fetching in the producer during send(). It's now exclusively
> > focused
> > > on
> > > > > > reducing the topic metadata fetch payload and proposes adding a
> new
> > > > > > configuration flag to control topic expiry behavior. Please take
> a
> > > look
> > > > > > when possible.
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > >
> > > > > > Thanks,
> > > > > > Brian
> > > > > >
> > > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <bby...@confluent.io
> >
> > > > wrote:
> > > > > >
> > > > > > > Lucas, Guozhang,
> > > > > > >
> > > > > > > Thank you for the comments. Good point on
> > METADATA_MAX_AGE_CONFIG -
> > > > it
> > > > > > > looks like the ProducerMetadata was differentiating between
> > expiry
> > > > and
> > > > > > > refresh, but it should be unnecessary to do so once the cost of
> > > > > fetching
> > > > > > a
> > > > > > > single topic's metadata is reduced.
> > > > > > >
> > > > > > > I've updated the rejected alternatives and removed the config
> > > > > variables.
> > > > > > >
> > > > > > > Brian
> > > > > > >
> > > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <
> wangg...@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> Hello Brian,
> > > > > > >>
> > > > > > >> Thanks for the KIP.
> > > > > > >>
> > > > > > >> I think using asynchronous metadata update to address 1)
> > metadata
> > > > > update
> > > > > > >> blocking send, but for other issues, currently at producer we
> do
> > > > have
> > > > > a
> > > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by
> > > > default
> > > > > > is
> > > > > > >> 5min. So maybe we do not need to introduce new configs here,
> but
> > > > only
> > > > > > >> change the semantics of that config from global expiry (today
> we
> > > > just
> > > > > > >> enforce a full metadata update for the whole cluster) to
> > > > single-topic
> > > > > > >> expiry, and we can also extend its expiry deadline whenever
> that
> > > > > > metadata
> > > > > > >> is successfully used to send a produce request.
> > > > > > >>
> > > > > > >>
> > > > > > >> Guozhang
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> > > lu...@confluent.io
> > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi Brian,
> > > > > > >> >
> > > > > > >> > This looks great, and should help reduce blocking and high
> > > > metadata
> > > > > > >> request
> > > > > > >> > volumes when the producer is sending to large numbers of
> > topics,
> > > > > > >> especially
> > > > > > >> > at low volumes. I think the approach to make metadata
> fetching
> > > > > > >> asynchronous
> > > > > > >> > and batch metadata requests together will help
> significantly.
> > > > > > >> >
> > > > > > >> > The only other approach I can think of is to allow users to
> > > supply
> > > > > the
> > > > > > >> > producer with the expected topics upfront, allowing the
> > producer
> > > > to
> > > > > > >> perform
> > > > > > >> > a single initial metadata request before any sends occur. I
> > see
> > > no
> > > > > > real
> > > > > > >> > advantages to this approach compared to the async method
> > you’ve
> > > > > > >> proposed,
> > > > > > >> > but maybe we could add it to the rejected alternatives
> > section?
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> >
> > > > > > >> > Lucas
> > > > > > >> >
> > > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> > bby...@confluent.io>
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> > > I've updated the 'Proposed Changes' to include two new
> > > producer
> > > > > > >> > > configuration variables: topic.expiry.ms and
> > topic.refresh.ms
> > > .
> > > > > > Please
> > > > > > >> > take
> > > > > > >> > > a look.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Brian
> > > > > > >> > >
> > > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > > > bby...@confluent.io
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > >
> > > > > > >> > > > Dev team,
> > > > > > >> > > >
> > > > > > >> > > > Requesting discussion for improvement to the producer
> when
> > > > > dealing
> > > > > > >> > with a
> > > > > > >> > > > large number of topics.
> > > > > > >> > > >
> > > > > > >> > > > KIP:
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > >> > > >
> > > > > > >> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > > >> > > >
> > > > > > >> > > > Thoughts and feedback would be appreciated.
> > > > > > >> > > >
> > > > > > >> > > > Thanks,
> > > > > > >> > > > Brian
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> -- Guozhang
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Reply via email to