Re: KIP-41: KafkaConsumer Max Records

Jason Gustafson Mon, 04 Jan 2016 12:19:15 -0800

Hi Cliff,

I think we're all agreed that the current contract of poll() should be
kept. The consumer wouldn't wait for max messages to become available in
this proposal; it would only sure that it never returns more than max
messages.


-Jason

On Mon, Jan 4, 2016 at 11:52 AM, Cliff Rhyne <crh...@signal.co> wrote:

> Instead of a heartbeat, I'd prefer poll() to return whatever messages the
> client has.  Either a) I don't care if I get less than my max message limit
> or b) I do care and will set a larger timeout.  Case B is less common than
> A and is fairly easy to handle in the application's code.
>
> On Mon, Jan 4, 2016 at 1:47 PM, Gwen Shapira <g...@confluent.io> wrote:
>
> > 1. Agree that TCP window style scaling will be cool. I'll try to think
> of a
> > good excuse to use it ;)
> >
> > 2. I'm very concerned about the challenges of getting the timeouts,
> > hearbeats and max messages right.
> >
> > Another option could be to expose "heartbeat" API to consumers. If my app
> > is still processing data but is still alive, it could initiate a
> heartbeat
> > to signal its alive without having to handle additional messages.
> >
> > I don't know if this improves more than it complicates though :(
> >
> > On Mon, Jan 4, 2016 at 11:40 AM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hey Gwen,
> > >
> > > I was thinking along the lines of TCP window scaling in order to
> > > dynamically find a good consumption rate. Basically you'd start off
> > > consuming say 100 records and you'd let it increase until the
> consumption
> > > took longer than half the session timeout (for example). You /might/ be
> > > able to achieve the same thing using pause/resume, but it would be a
> lot
> > > trickier since you have to do it at the granularity of partitions. But
> > > yeah, database write performance doesn't always scale in a predictable
> > > enough way to accommodate this, so I'm not sure how useful it would be
> in
> > > practice. It might also be more difficult to implement since it
> wouldn't
> > be
> > > as clear when to initiate the next fetch. With a static setting, the
> > > consumer knows exactly how many records will be returned on the next
> call
> > > to poll() and can send fetches accordingly.
> > >
> > > On the other hand, I do feel a little wary of the need to tune the
> > session
> > > timeout and max messages though since these settings might depend on
> the
> > > environment that the consumer is deployed in. It wouldn't be a big deal
> > if
> > > the impact was relatively minor, but getting them wrong can cause a lot
> > of
> > > rebalance churn which could keep the consumer from making any progress.
> > > It's not a particularly graceful failure.
> > >
> > > -Jason
> > >
> > > On Mon, Jan 4, 2016 at 10:49 AM, Gwen Shapira <g...@confluent.io>
> wrote:
> > >
> > > > I can't speak to all use-cases, but for the database one, I think
> > > > pause-resume will be necessary in any case, and therefore dynamic
> batch
> > > > sizes are not needed.
> > > >
> > > > Databases are really unexpected regarding response times - load and
> > > locking
> > > > can affect this. I'm not sure there's a good way to know you are
> going
> > > into
> > > > rebalance hell before it is too late. So if I were writing code that
> > > > updates an RDBMS based on Kafka, I'd pick a reasonable batch size
> (say
> > > 5000
> > > > records), and basically pause, batch-insert all records, commit and
> > > resume.
> > > >
> > > > Does that make sense?
> > > >
> > > > On Mon, Jan 4, 2016 at 10:37 AM, Jason Gustafson <ja...@confluent.io
> >
> > > > wrote:
> > > >
> > > > > Gwen and Ismael,
> > > > >
> > > > > I agree the configuration option is probably the way to go, but I
> was
> > > > > wondering whether there would be cases where it made sense to let
> the
> > > > > consumer dynamically set max messages to adjust for downstream
> > > slowness.
> > > > > For example, if the consumer is writing consumed records to another
> > > > > database, and that database is experiencing heavier than expected
> > load,
> > > > > then the consumer could halve its current max messages in order to
> > > adapt
> > > > > without risking rebalance hell. It could then increase max messages
> > as
> > > > the
> > > > > load on the database decreases. It's basically an easier way to
> > handle
> > > > flow
> > > > > control than we provide with pause/resume.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Mon, Jan 4, 2016 at 9:46 AM, Gwen Shapira <g...@confluent.io>
> > > wrote:
> > > > >
> > > > > > The wiki you pointed to is no longer maintained and fell out of
> > sync
> > > > with
> > > > > > the code and protocol.
> > > > > >
> > > > > > You may want  to refer to:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
> > > > > >
> > > > > > On Mon, Jan 4, 2016 at 4:38 AM, Jens Rantil <jens.ran...@tink.se
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi guys,
> > > > > > >
> > > > > > > I realized I never thanked yall for your input - thanks!
> > > > > > > Jason: I apologize for assuming your stance on the issue! Feels
> > > like
> > > > we
> > > > > > all
> > > > > > > agreed on the solution. +1
> > > > > > >
> > > > > > > Follow-up: Jason made a point about defining prefetch and
> > fairness
> > > > > > > behaviour in the KIP. I am now working on putting that down in
> > > > writing.
> > > > > > To
> > > > > > > do be able to do this I think I need to understand the current
> > > > prefetch
> > > > > > > behaviour in the new consumer API (0.9) a bit better. Some
> > specific
> > > > > > > questions:
> > > > > > >
> > > > > > >    - How does a specific consumer balance incoming messages
> from
> > > > > multiple
> > > > > > >    partitions? Is the consumer simply issuing Multi-Fetch
> > > requests[1]
> > > > > for
> > > > > > > the
> > > > > > >    consumed assigned partitions of the relevant topics? Or is
> the
> > > > > > consumer
> > > > > > >    fetching from one partition at a time and balancing between
> > them
> > > > > > >    internally? That is, is the responsibility of partition
> > > balancing
> > > > > (and
> > > > > > >    fairness) on the broker side or consumer side?
> > > > > > >    - Is the above documented somewhere?
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Writing+a+Driver+for+Kafka
> > > > > > > ,
> > > > > > > see "Multi-Fetch".
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jens
> > > > > > >
> > > > > > > On Wed, Dec 23, 2015 at 2:44 AM, Ismael Juma <
> ism...@juma.me.uk>
> > > > > wrote:
> > > > > > >
> > > > > > > > On Wed, Dec 23, 2015 at 1:24 AM, Gwen Shapira <
> > g...@confluent.io
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Given the background, it sounds like you'll generally want
> > each
> > > > > call
> > > > > > to
> > > > > > > > > poll() to return the same number of events (which is the
> > number
> > > > you
> > > > > > > > planned
> > > > > > > > > on having enough memory / time for). It also sounds like
> > tuning
> > > > the
> > > > > > > > number
> > > > > > > > > of events will be closely tied to tuning the session
> timeout.
> > > > That
> > > > > > is -
> > > > > > > > if
> > > > > > > > > I choose to lower the session timeout for some reason, I
> will
> > > > have
> > > > > to
> > > > > > > > > modify the number of records returning too.
> > > > > > > > >
> > > > > > > > > If those assumptions are correct, I think a configuration
> > makes
> > > > > more
> > > > > > > > sense.
> > > > > > > > > 1. We are unlikely to want this parameter to be change at
> the
> > > > > > lifetime
> > > > > > > of
> > > > > > > > > the consumer
> > > > > > > > > 2. The correct value is tied to another configuration
> > > parameter,
> > > > so
> > > > > > > they
> > > > > > > > > will be controlled together.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I was thinking the same thing.
> > > > > > > >
> > > > > > > > Ismael
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Jens Rantil
> > > > > > > Backend engineer
> > > > > > > Tink AB
> > > > > > >
> > > > > > > Email: jens.ran...@tink.se
> > > > > > > Phone: +46 708 84 18 32
> > > > > > > Web: www.tink.se
> > > > > > >
> > > > > > > Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> > > > > > > >
> > > > > > >  Twitter <https://twitter.com/tink>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Cliff Rhyne
> Software Engineering Lead
> e: crh...@signal.co
> signal.co
> ________________________
>
> Cut Through the Noise
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized use of this email is strictly prohibited.
> ©2015 Signal. All rights reserved.
>

Re: KIP-41: KafkaConsumer Max Records

Reply via email to