Thanks Guozhang, that clarifies the Streams behavior.

I'm imagining that a Streams application might only commit partition
offsets that have changed, and therefore a partition that is idle for
greater than offsets.retention.minutes might lose its offsets when the app
restarts.  Does that seem plausible?

That theory seems to be supported by a brief look at the code;
StreamTask#commitOffsets() sends only the consumed offsets since the
last commitOffsets
call.

This would definitely match behavior I've been puzzled about for a while.
I deploy my Streams app, shove a lot of data at it to see how it is
performing and outputting, then go and do some more development work.
After a day or two (or maybe a weekend), I redeploy the app, and it pops
back to the beginning of all the topics., surprising the heck out of me and
making me think I broke something. :-)

Increasing offsets.retention.minutes seems like the easy immediate fix.  It
might be ideal if a Streams app kept idle offsets refreshed occasionally,
but it's not too likely to impact more realistic use-cases.

Mathieu


On Wed, Feb 22, 2017 at 2:18 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hi Mathieu,
>
> In Streams the consumer config "enable.auto.commit" is always forced to
> false, and a separate "commit.interval.ms" is set. With that even if you
> do
> not have any data processed the commit operation will be triggered after
> that configured period of time.
>
>
> Guozhang
>
>
> On Wed, Feb 22, 2017 at 8:41 AM, Mathieu Fenniak <
> mathieu.fenn...@replicon.com> wrote:
>
> > Hi Eno,
> >
> > Thanks for the quick reply.  I think that probably does match the data
> I'm
> > seeing.  This surprises me a bit because my streams app was only offline
> > for a few minutes, but ended up losing its offset.
> >
> > My interpretation is that the source partition had been idle for 24
> hours,
> > streams doesn't commit offsets for idle partitions, and so the
> > default/unconfigured offset retention of 24 hours had expired.
> >
> > I'll work around this by bumping up my offset retention.  Thanks!
> >
> > Mathieu
> >
> >
> > On Wed, Feb 22, 2017 at 9:22 AM, Eno Thereska <eno.there...@gmail.com>
> > wrote:
> >
> > > Hi Mathieu,
> > >
> > > It could be that the offset retention period has expired. See this:
> > > http://stackoverflow.com/questions/39131465/how-does-
> > > an-offset-expire-for-an-apache-kafka-consumer-group <
> > > http://stackoverflow.com/questions/39131465/how-does-
> > > an-offset-expire-for-an-apache-kafka-consumer-group>
> > >
> > > Thanks
> > > Eno
> > >
> > > > On 22 Feb 2017, at 16:08, Mathieu Fenniak <
> > mathieu.fenn...@replicon.com>
> > > wrote:
> > > >
> > > > Hey users,
> > > >
> > > > What causes delete tombstones (value=null) to be sent to the
> > > > __consumer_offsets topic?
> > > >
> > > > I'm observing that a Kafka Streams application that is restarted
> after
> > a
> > > > crash appears to be reprocessing messages from the beginning of a
> > topic.
> > > > I've dumped the __consumer_offsets topic and found that after the
> > > restart,
> > > > messages with a null value are being sent to __consumer_offsets.
> > > >
> > > > I do see that the ConsumerConfig for my StreamThread consumer has
> > > > auto.offset.reset=earliest.  But my understanding of this
> configuration
> > > is
> > > > that it only applies when the offset isn't available, but there are
> > > > definitely offsets for this consumer group stored in
> > __consumer_offsets.
> > > >
> > > > Here's the consumer config for the streams app:
> > > >
> > > > ConsumerConfig values:
> > > >  auto.commit.interval.ms = 5000
> > > >  auto.offset.reset = earliest
> > > >  bootstrap.servers = [10.10.59.184:9092]
> > > >  check.crcs = true
> > > >  client.id =
> > > > timesheet-list-2d7a7f37-f41a-46b0-a1bb-d47f773012f6-
> > > StreamThread-1-consumer
> > > >  connections.max.idle.ms = 540000
> > > >  enable.auto.commit = false
> > > >  exclude.internal.topics = true
> > > >  fetch.max.bytes = 52428800
> > > >  fetch.max.wait.ms = 500
> > > >  fetch.min.bytes = 1
> > > >  group.id = timesheet-list
> > > >  heartbeat.interval.ms = 3000
> > > >  interceptor.classes = null
> > > >  key.deserializer = class
> > > > org.apache.kafka.common.serialization.ByteArrayDeserializer
> > > >  max.partition.fetch.bytes = 1048576
> > > >  max.poll.interval.ms = 1800000
> > > >  max.poll.records = 1000
> > > >  metadata.max.age.ms = 300000
> > > >  metric.reporters = []
> > > >  metrics.num.samples = 2
> > > >  metrics.recording.level = INFO
> > > >  metrics.sample.window.ms = 30000
> > > >  partition.assignment.strategy =
> > > > [org.apache.kafka.streams.processor.internals.
> StreamPartitionAssignor]
> > > >  receive.buffer.bytes = 65536
> > > >  reconnect.backoff.ms = 50
> > > >  request.timeout.ms = 1801000
> > > >  retry.backoff.ms = 100
> > > >  sasl.jaas.config = null
> > > >  sasl.kerberos.kinit.cmd = /usr/bin/kinit
> > > >  sasl.kerberos.min.time.before.relogin = 60000
> > > >  sasl.kerberos.service.name = null
> > > >  sasl.kerberos.ticket.renew.jitter = 0.05
> > > >  sasl.kerberos.ticket.renew.window.factor = 0.8
> > > >  sasl.mechanism = GSSAPI
> > > >  security.protocol = PLAINTEXT
> > > >  send.buffer.bytes = 131072
> > > >  session.timeout.ms = 10000
> > > >  ssl.cipher.suites = null
> > > >  ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> > > >  ssl.endpoint.identification.algorithm = null
> > > >  ssl.key.password = null
> > > >  ssl.keymanager.algorithm = SunX509
> > > >  ssl.keystore.location = null
> > > >  ssl.keystore.password = null
> > > >  ssl.keystore.type = JKS
> > > >  ssl.protocol = TLS
> > > >  ssl.provider = null
> > > >  ssl.secure.random.implementation = null
> > > >  ssl.trustmanager.algorithm = PKIX
> > > >  ssl.truststore.location = null
> > > >  ssl.truststore.password = null
> > > >  ssl.truststore.type = JKS
> > > >  value.deserializer = class
> > > > org.apache.kafka.common.serialization.ByteArrayDeserializer
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to