Re: Low-latency, high message size variance

Jason Gustafson Mon, 14 Dec 2015 17:18:01 -0800

Hey Jens,

I'm not sure I understand why increasing the session timeout is not an
option. Is the issue that there's too much uncertainly about processing
time to set an upper bound for each round of the poll loop?


One general workaround would be to move the processing into another thread.
For example, you could add messages to a blocking queue and have the
processor poll them. We have a pause() API which can be used to prevent
additional fetches when the queue is full. This would allow you to continue
the poll loop instead of blocking in offer(). The tricky thing would be
handling rebalances since you'd need to clear the queue and get the last
offset from the processors. Enough people probably will end up doing
something like this that we should probably work through an example on how
to do it properly.

-Jason

On Sun, Dec 13, 2015 at 8:35 AM, Jens Rantil <jens.ran...@tink.se> wrote:

> Hi again,
>
>
> For the record I filed an issue about this here:
> https://issues.apache.org/jira/browse/KAFKA-2986
>
>
>
>
> Cheers,
>
> Jens
>
>
>
>
>
> –
> Skickat från Mailbox
>
> On Fri, Dec 11, 2015 at 7:56 PM, Jens Rantil <jens.ran...@tink.se> wrote:
>
> > Hi,
> > We've been experimenting a little with running Kafka internally for
> better
> > handling temporary throughput peaks of asynchronous tasks. However, we've
> > had a big issues making Kafka work for us and I am starting to question
> > whether its a good fit.
> > Our usecase:
> >    - High latency. At peaks, each consumer requires ~20 seconds to
> handle a
> >    single message/task.
> >    - Extreme variation in message size: Serialized tasks are in the range
> >    of ~300 bytes up to ~3 MB.
> >    - Generally, it is processed in 20 seconds independent of message
> size.
> >    Most messages are small.
> > Our setup:
> >    - Kafka 0.9.0.
> >    - Using the new Java consumer API (consumer groups etc.).
> >    - To occasionally handle large (3 MB) messages we've had to set the
> >    following configuration parameters:
> >       - max.partition.fetch.bytes=10485760=10MB on consumer to handle
> >       larger messages.
> >       - session.timeout.ms=30s to handle our high latency processing.
> >       - replica.fetch.max.bytes=10485760=10MB on broker.
> >       - message.max.bytes=10485760=10MB on broker.
> > Sample code:
> > while (isRunning()) {
> >   ConsumerRecords<String, byte[]> records = consumer.poll(100);
> >   for (final ConsumerRecord<String, byte[]> record : records) {
> >     // Handle record...
> >   }
> > }
> > AFAIK this seem like a very basic consumer code.
> > Initial problem: When doing load testing to simulate peaks our consumer
> > started spinning infinitely in similar fashion to [1]. We also noticed
> that
> > we consistently were seeing [2] in our broker log.
> > [1] http://bit.ly/1Q7zxgh
> > [2] https://gist.github.com/JensRantil/2d1e7db3bd919eb35f9b
> > Root cause analysis: AFAIK, health checks are only submitted to Kafka
> when
> > calling `consumer.poll(...)`. To handle larger messages, we needed to
> > increase max.partition.fetch.bytes. However, due to our high latency
> > consumer a large amounts of small messages could be prefetched which made
> > our inner for loop run long enough for the broker to consider our
> consumer
> > dead.
> > Two questions:
> >    - Is there any workaround to avoid the broker thinking our consumer is
> >    dead? Increasing session timeout to handle the polling interval for
> small
> >    messages is not an option since we simply prefetch too many messages
> for
> >    that to be an option. Can we set a limit on how many messages Kafka
> >    prefetches? Or is there a way to send health checks to broker out of
> bands
> >    without invoking the `KafkaConsumer#poll` method?
> >    - Is Kafka a bad tool for our usecase?
> > Thanks and have a nice weekend,
> > Jens
> > --
> > Jens Rantil
> > Backend engineer
> > Tink AB
> > Email: jens.ran...@tink.se
> > Phone: +46 708 84 18 32
> > Web: www.tink.se
> > Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> > <
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> >
> >  Twitter <https://twitter.com/tink>
>

Re: Low-latency, high message size variance

Reply via email to