Re: KIP-41: KafkaConsumer Max Records

Cliff Rhyne Mon, 04 Jan 2016 11:00:20 -0800

My team's main use-case benefits from flexibility between consumers, which
both options handle well.  If the poll timeout triggers before the full
batch size, I'd still want to get whatever is available.


On Mon, Jan 4, 2016 at 12:49 PM, Gwen Shapira <[email protected]> wrote:

> I can't speak to all use-cases, but for the database one, I think
> pause-resume will be necessary in any case, and therefore dynamic batch
> sizes are not needed.
>
> Databases are really unexpected regarding response times - load and locking
> can affect this. I'm not sure there's a good way to know you are going into
> rebalance hell before it is too late. So if I were writing code that
> updates an RDBMS based on Kafka, I'd pick a reasonable batch size (say 5000
> records), and basically pause, batch-insert all records, commit and resume.
>
> Does that make sense?
>
> On Mon, Jan 4, 2016 at 10:37 AM, Jason Gustafson <[email protected]>
> wrote:
>
> > Gwen and Ismael,
> >
> > I agree the configuration option is probably the way to go, but I was
> > wondering whether there would be cases where it made sense to let the
> > consumer dynamically set max messages to adjust for downstream slowness.
> > For example, if the consumer is writing consumed records to another
> > database, and that database is experiencing heavier than expected load,
> > then the consumer could halve its current max messages in order to adapt
> > without risking rebalance hell. It could then increase max messages as
> the
> > load on the database decreases. It's basically an easier way to handle
> flow
> > control than we provide with pause/resume.
> >
> > -Jason
> >
> > On Mon, Jan 4, 2016 at 9:46 AM, Gwen Shapira <[email protected]> wrote:
> >
> > > The wiki you pointed to is no longer maintained and fell out of sync
> with
> > > the code and protocol.
> > >
> > > You may want  to refer to:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
> > >
> > > On Mon, Jan 4, 2016 at 4:38 AM, Jens Rantil <[email protected]>
> wrote:
> > >
> > > > Hi guys,
> > > >
> > > > I realized I never thanked yall for your input - thanks!
> > > > Jason: I apologize for assuming your stance on the issue! Feels like
> we
> > > all
> > > > agreed on the solution. +1
> > > >
> > > > Follow-up: Jason made a point about defining prefetch and fairness
> > > > behaviour in the KIP. I am now working on putting that down in
> writing.
> > > To
> > > > do be able to do this I think I need to understand the current
> prefetch
> > > > behaviour in the new consumer API (0.9) a bit better. Some specific
> > > > questions:
> > > >
> > > >    - How does a specific consumer balance incoming messages from
> > multiple
> > > >    partitions? Is the consumer simply issuing Multi-Fetch requests[1]
> > for
> > > > the
> > > >    consumed assigned partitions of the relevant topics? Or is the
> > > consumer
> > > >    fetching from one partition at a time and balancing between them
> > > >    internally? That is, is the responsibility of partition balancing
> > (and
> > > >    fairness) on the broker side or consumer side?
> > > >    - Is the above documented somewhere?
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Writing+a+Driver+for+Kafka
> > > > ,
> > > > see "Multi-Fetch".
> > > >
> > > > Thanks,
> > > > Jens
> > > >
> > > > On Wed, Dec 23, 2015 at 2:44 AM, Ismael Juma <[email protected]>
> > wrote:
> > > >
> > > > > On Wed, Dec 23, 2015 at 1:24 AM, Gwen Shapira <[email protected]>
> > > wrote:
> > > > >
> > > > > > Given the background, it sounds like you'll generally want each
> > call
> > > to
> > > > > > poll() to return the same number of events (which is the number
> you
> > > > > planned
> > > > > > on having enough memory / time for). It also sounds like tuning
> the
> > > > > number
> > > > > > of events will be closely tied to tuning the session timeout.
> That
> > > is -
> > > > > if
> > > > > > I choose to lower the session timeout for some reason, I will
> have
> > to
> > > > > > modify the number of records returning too.
> > > > > >
> > > > > > If those assumptions are correct, I think a configuration makes
> > more
> > > > > sense.
> > > > > > 1. We are unlikely to want this parameter to be change at the
> > > lifetime
> > > > of
> > > > > > the consumer
> > > > > > 2. The correct value is tied to another configuration parameter,
> so
> > > > they
> > > > > > will be controlled together.
> > > > > >
> > > > >
> > > > > I was thinking the same thing.
> > > > >
> > > > > Ismael
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Jens Rantil
> > > > Backend engineer
> > > > Tink AB
> > > >
> > > > Email: [email protected]
> > > > Phone: +46 708 84 18 32
> > > > Web: www.tink.se
> > > >
> > > > Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> > > > <
> > > >
> > >
> >
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> > > > >
> > > >  Twitter <https://twitter.com/tink>
> > > >
> > >
> >
>



-- 
Cliff Rhyne
Software Engineering Lead
e: [email protected]
signal.co
________________________

Cut Through the Noise

This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized use of this email is strictly prohibited.
©2015 Signal. All rights reserved.

Re: KIP-41: KafkaConsumer Max Records

Reply via email to