Re: Log Cleaner Thread Stops

Todd Palino Mon, 28 Sep 2015 09:59:11 -0700

This is correct, compression isn't used for the offsets at all. If, for
some reason, you do have either a compressed or a corrupt message somewhere
in the topic, the method I mentioned previously will flush it out. We
haven't seen that as a recurring problem, so fixing it once is sufficient.


-Todd


On Mon, Sep 28, 2015 at 9:53 AM, Jason Rosenberg <j...@squareup.com> wrote:

> Just to clarify too, if the only use case for log-compaction we use is for
> the __consumer_offsets, we should be ok, correct?  I assume compression is
> not used by default for consumer offsets?
>
> Jason
>
> On Fri, Sep 25, 2015 at 12:15 AM, Todd Palino <tpal...@gmail.com> wrote:
>
> > For now, that's the way it is. Historically, we've only monitored the lag
> > for our infrastructure applications. Other users are responsible for
> their
> > own checking, typically using the maxlag mbean or some application
> specific
> > metric. Besides the core, we've probably got a dozen or so consumers
> moved
> > over to Kafka committed offsets at this point.
> >
> > Of course, just those apps do cover well over a hundred consumer groups
> :)
> >
> > -Todd
> >
> > On Thursday, September 24, 2015, James Cheng <jch...@tivo.com> wrote:
> >
> > >
> > > > On Sep 24, 2015, at 8:11 PM, Todd Palino <tpal...@gmail.com
> > > <javascript:;>> wrote:
> > > >
> > > > Well, in general you can't currently use compressed messages in any
> > topic
> > > > that has compaction turned on regardless of whether or not you are
> > using
> > > > Kafka-committed offsets. The log compaction thread will die either
> way.
> > > > There's only one compression thread for the broker that runs on all
> > > topics
> > > > that use compaction.
> > > >
> > > > Jason, to address your question, it's probably wise to wait for now.
> > > > Zookeeper offsets work, so unless it's broke, don't fix it for now.
> > We're
> > > > using Kafka-committed offsets at LinkedIn for our mirror makers and
> our
> > > > auditor application (both of which are considered infrastructure
> > > > applications for Kafka), but we're not encouraging other internal
> users
> > > to
> > > > switch over just yet.
> > > >
> > >
> > > Burrow depends on kafka-commited offsets, doesn’t it? I guess that
> means
> > > Burrow is only being used to monitor your mirror makers and auditor
> > > application, then?
> > >
> > > -James
> > >
> > > > -Todd
> > > >
> > > >
> > > > On Wed, Sep 23, 2015 at 3:21 PM, James Cheng <jch...@tivo.com
> > > <javascript:;>> wrote:
> > > >
> > > >>
> > > >> On Sep 18, 2015, at 10:25 AM, Todd Palino <tpal...@gmail.com
> > > <javascript:;>> wrote:
> > > >>
> > > >>> I think the last major issue with log compaction (that it couldn't
> > > handle
> > > >>> compressed messages) was committed as part of
> > > >>> https://issues.apache.org/jira/browse/KAFKA-1374 in August, but
> I'm
> > > not
> > > >>> certain what version this will end up in. It may be part of
> 0.8.2.2.
> > > >>>
> > > >>> Regardless, you'll probably be OK now. We've found that once we
> clean
> > > >> this
> > > >>> issue up once it doesn't appear to recur. As long as you're not
> > writing
> > > >> in
> > > >>> compressed messages to a log compacted topic (and that won't happen
> > > with
> > > >>> __consumer_offsets, as it's managed by the brokers themselves - it
> > > would
> > > >>> only be if you were using other log compacted topics), you're
> likely
> > in
> > > >> the
> > > >>> clear now.
> > > >>>
> > > >>
> > > >> Todd,
> > > >>
> > > >> If I understand your description of the problem, you are saying that
> > > >> enabling log compaction on a topic with compressed messages can
> > (will?)
> > > >> cause the log cleaner to crash when it encounters those compressed
> > > >> messages. And the death of the cleaner thread will prevent log
> > > compaction
> > > >> from running on other topics, even ones that don't have compressed
> > > messages.
> > > >>
> > > >> That means if we have a cluster where we want to use log compaction
> on
> > > >> *any* topic, we need to either:
> > > >> 1) apply https://issues.apache.org/jira/browse/KAFKA-1374 (or
> upgrade
> > > to
> > > >> some version it is applied)
> > > >> OR
> > > >> 2) make sure that we don't use compressed messages in *any* topic
> that
> > > has
> > > >> log compaction turned on.
> > > >>
> > > >> And, more specifically, if we want to make use of
> __consumer_offsets,
> > > then
> > > >> we cannot use compressed messages in any topic that has compaction
> > > turned
> > > >> on.
> > > >>
> > > >> Is that right?
> > > >> -James
> > > >>
> > > >>> -Todd
> > > >>>
> > > >>>
> > > >>> On Fri, Sep 18, 2015 at 9:54 AM, John Holland <
> > > >>> john.holl...@objectpartners.com <javascript:;>> wrote:
> > > >>>
> > > >>>> Thanks!
> > > >>>>
> > > >>>> I did what you suggested and it worked except it was necessary for
> > me
> > > to
> > > >>>> remove the cleaner-offset-checkpoint file from the data directory
> > and
> > > >>>> restart the servers.  The log indicates all is well.
> > > >>>>
> > > >>>> Do you know what version the fix to this will be in? I'm not
> looking
> > > >>>> forward to dealing with this on a reoccurring basis.
> > > >>>>
> > > >>>> -John
> > > >>>>
> > > >>>> On Fri, Sep 18, 2015 at 8:48 AM Todd Palino <tpal...@gmail.com
> > > <javascript:;>> wrote:
> > > >>>>
> > > >>>>> Yes, this is a known concern, and it should be fixed with recent
> > > >> commits.
> > > >>>>> In the meantime, you'll have to do a little manual cleanup.
> > > >>>>>
> > > >>>>> The problem you're running into is a corrupt message in the
> offsets
> > > >>>> topic.
> > > >>>>> We've seen this a lot. What you need to do is set the topic
> > > >> configuration
> > > >>>>> to remove the cleanup.policy config, and set retention.ms and
> > > >> segment.ms
> > > >>>>> to
> > > >>>>> something reasonably low. I suggest using a value of 3 or 4 times
> > > your
> > > >>>>> commit interval for consumers. Then wait until the log segments
> are
> > > >>>> reaped
> > > >>>>> (wait twice as long as the retention.ms you chose, to be safe).
> > Once
> > > >>>> this
> > > >>>>> is done, you can set the topic configuration back the way it was
> > > >> (remove
> > > >>>>> segment.ms and retention.ms configs, and set
> > > cleanup.policy=compact).
> > > >>>>> Lastly, you'll need to do a rolling bounce of the cluster to
> > restart
> > > >> the
> > > >>>>> brokers (which restarts the log cleaner threads). Technically,
> you
> > > only
> > > >>>>> need to restart brokers where the threads have died, but it's
> > easier
> > > to
> > > >>>>> just restart all of them.
> > > >>>>>
> > > >>>>> Keep in mind that when you do this, you are deleting old offsets.
> > If
> > > >> your
> > > >>>>> consumers are all live and healthy, this shouldn't be a problem
> > > because
> > > >>>>> they will just continue to commit their offsets properly. But if
> > you
> > > >> have
> > > >>>>> an offline consumer, you'll lose the committed offsets by doing
> > this.
> > > >>>>>
> > > >>>>> -Todd
> > > >>>>>
> > > >>>>>
> > > >>>>> On Fri, Sep 18, 2015 at 5:31 AM, John Holland <
> > > >>>>> john.holl...@objectpartners.com <javascript:;>> wrote:
> > > >>>>>
> > > >>>>>> I've been experiencing this issue across several of our
> > environments
> > > >>>> ever
> > > >>>>>> since we enabled the log cleaner for the __consumer_offsets
> topic.
> > > >>>>>>
> > > >>>>>> We are on version 0.8.2.1 of kafka, using the new producer.  All
> > of
> > > >> our
> > > >>>>>> consumers are set to commit to kafka only.
> > > >>>>>>
> > > >>>>>> Below is the stack trace in the log I've encountered across
> > several
> > > >>>>>> different clusters.  A simple restart of kafka will allow
> > compaction
> > > >> to
> > > >>>>>> continue on all of the other partitions but the incorrect one
> will
> > > >>>> always
> > > >>>>>> fail.
> > > >>>>>>
> > > >>>>>> Here are the values for it from the kafka-topics --describe
> > command:
> > > >>>>>>
> > > >>>>>> Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:3
> > > >>>>>> Configs:segment.bytes=104857600,cleanup.policy=compact
> > > >>>>>>
> > > >>>>>> Are there any recommendations on how to prevent this and the
> best
> > > way
> > > >>>> to
> > > >>>>>> recover from this exception?  This is causing disk space to fill
> > up
> > > >>>>> quickly
> > > >>>>>> on the node.
> > > >>>>>>
> > > >>>>>> I did see an open issue that seems very similar to this
> > > >>>>>> https://issues.apache.org/jira/browse/KAFKA-1641 but this is
> the
> > > >>>>>> __consumer_offsets topic which I have not had any part in
> setting
> > up
> > > >>>> nor
> > > >>>>>> producing to.
> > > >>>>>>
> > > >>>>>> [2015-09-18 02:57:25,520] INFO Cleaner 0: Beginning cleaning of
> > log
> > > >>>>>> __consumer_offsets-17. (kafka.log.LogCleaner)
> > > >>>>>> [2015-09-18 02:57:25,520] INFO Cleaner 0: Building offset map
> for
> > > >>>>>> __consumer_offsets-17... (kafka.log.LogCleaner)
> > > >>>>>> [2015-09-18 02:57:25,609] INFO Cleaner 0: Building offset map
> for
> > > log
> > > >>>>>> __consumer_offsets-17 for 46 segments in offset range
> [468079184,
> > > >>>>>> 528707475). (kafka.log.LogCleaner)
> > > >>>>>> [2015-09-18 02:57:25,645] ERROR [kafka-log-cleaner-thread-0],
> > Error
> > > >> due
> > > >>>>> to
> > > >>>>>> (kafka.log.LogCleaner)
> > > >>>>>> java.lang.IllegalArgumentException: requirement failed: Last
> clean
> > > >>>> offset
> > > >>>>>> is 468079184 but segment base offset is 0 for log
> > > >>>> __consumer_offsets-17.
> > > >>>>>>       at scala.Predef$.require(Predef.scala:233)
> > > >>>>>>       at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:509)
> > > >>>>>>       at kafka.log.Cleaner.clean(LogCleaner.scala:307)
> > > >>>>>>       at
> > > >>>>>>
> > > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221)
> > > >>>>>>       at
> > > >>>>> kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199)
> > > >>>>>>       at
> > > >>>>> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> > > >>>>>> [2015-09-18 02:57:25,654] INFO [kafka-log-cleaner-thread-0],
> > Stopped
> > > >>>>>> (kafka.log.LogCleaner)
> > > >>>>>>
> > > >>>>>> -John
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> > > ________________________________
> > >
> > > This email and any attachments may contain confidential and privileged
> > > material for the sole use of the intended recipient. Any review,
> copying,
> > > or distribution of this email (or any attachments) by others is
> > prohibited.
> > > If you are not the intended recipient, please contact the sender
> > > immediately and permanently delete this email and any attachments. No
> > > employee or agent of TiVo Inc. is authorized to conclude any binding
> > > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > > Inc. may only be made by a signed written agreement.
> > >
> >
>

Re: Log Cleaner Thread Stops

Reply via email to