Re: Kafka: Messages disappearing from topics, largestTime=0

JP MB Tue, 05 May 2020 02:14:34 -0700

Hi,
What I can see from the configurations:

> log.dir = /tmp/kafka-logs (default)
> log.dirs = /var/kafkadata/data01/data



>From the documentation log.dir is only used if log.dirs is not set, so
*/var/kafkadata/data01/data
*is the folder used for logs.

Regards

Em ter., 5 de mai. de 2020 às 08:52, Nicolas Carlot
<nicolas.car...@chronopost.fr.invalid> escreveu:

> Hi guys, still following your discussion even if it's out of my reach.
> Just been noticing that you use /tmp/ for your logs, dunno if it's a good
> idea :o https://issues.apache.org/jira/browse/KAFKA-3925
>
> Le lun. 4 mai 2020 à 19:40, JP MB <jose.brandao1...@gmail.com> a écrit :
>
> > Here are the startup logs from a deployment where we lost 15 messages in
> > topic-p:
> > https://gist.github.com/josebrandao13/81271140e59e28eda7aaa777d2d3b02c
> >
> > .timeindex files state before the deployment:
> > *Partitions with messages: timestamp mismatch
> > *Partitions without messages: permission denied
> >
> > .timeindex files state before the deployment:
> > *All partitions without messages: permission denied, new files were
> > created.
> >
> > Don't see anything particular in the logs but you can see that the
> messages
> > are deleted with largestTime=0 and are from today morning.
> >
> > Em seg., 4 de mai. de 2020 às 11:37, JP MB <jose.brandao1...@gmail.com>
> > escreveu:
> >
> > > Hi guys,
> > >
> > > I'm gonna get back to this today, I get mixed feelings regarding the
> > > volumes being the cause. This volume switching is around for quite some
> > > time, in a lot of clusters, and we only started noticing this problem
> > when
> > > we updated some of them. Also, this only happens in *a few* of those
> > > .timeindex files and not in all of them. The .log files or .index files
> > > which are also on the volumes don't also have the problem.
> > >
> > > Additionally, I'm a bit confused on what should be the initial state of
> > > those .timeindex files. Sometimes I see "found log offset: -1", others
> > the
> > > timestamp mismatch error "Index timestamp: 0, log timestamp:
> > 1588583643582"
> > > and sometimes something like this "Indexed offset: 0, found log offset:
> > 28".
> > >
> > > So we have seen previously that whenever the timestamp mismatch error
> is
> > > present we lose messages. eventually after a deployment. Since this
> looks
> > > like the trigger for the problem I would like to understand how it can
> > > happen. So my question is, how can each of those different states of
> > > initialization for the .timeindexes happen? We can reproduce all of
> them
> > > when running Kafka locally.
> > >
> > > Meanwhile, I'm trying to reproduce the situation in our dev
> environments
> > > and get some startup logs and I will play with the log flush settings.
> > >
> > > Regards
> > >
> > > Em sáb., 2 de mai. de 2020 às 14:45, Liam Clarke-Hutchinson <
> > > liam.cla...@adscale.co.nz> escreveu:
> > >
> > >> Good luck JP, do try it with the volume switching commented out, and
> see
> > >> how it goes.
> > >>
> > >> On Fri, May 1, 2020 at 6:50 PM JP MB <jose.brandao1...@gmail.com>
> > wrote:
> > >>
> > >> > Thank you very much for the help anyway.
> > >> >
> > >> > Best regards
> > >> >
> > >> > On Fri, May 1, 2020, 00:54 Liam Clarke-Hutchinson <
> > >> > liam.cla...@adscale.co.nz>
> > >> > wrote:
> > >> >
> > >> > > So the logs show a healthy shutdown, so we can eliminate that as
> an
> > >> > issue.
> > >> > > I would look next at the volume management during a rollout based
> on
> > >> the
> > >> > > other error messages you had earlier about permission denied etc.
> > It's
> > >> > > possible there's some journalled but not flushed changes in those
> > time
> > >> > > indexes, but at this point we're getting into filesystem internals
> > >> which
> > >> > > aren't my forte. But if you can temporarily disable the volume
> > >> switching
> > >> > > and do a test roll out, see if you get the same problems or not,
> > would
> > >> > help
> > >> > > eliminate it or confirm it.
> > >> > >
> > >> > > Sorry I can't help further on that.
> > >> > >
> > >> > > On Fri, May 1, 2020 at 5:34 AM JP MB <jose.brandao1...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > > > I took a bit because I needed logs of the server shutting down
> > when
> > >> > this
> > >> > > > occurs. Here they are, I can see some errors:
> > >> > > >
> > >>
> https://gist.github.com/josebrandao13/e8b82469d3e9ad91fbf38cf139b5a726
> > >> > > >
> > >> > > > Regarding systemd, the closest I could find to TimeoutStopSec
> was
> > >> > > > DefaultTimeoutStopUSec=1min 30s that looks to be 90seconds. I
> > could
> > >> not
> > >> > > > find any KillSignal or RestartKillSignal. You can see the output
> > of
> > >> > > > systemctl show --all here:
> > >> > > >
> > >>
> https://gist.github.com/josebrandao13/f2dd646fab19b19f127981fce92d78c4
> > >> > > >
> > >> > > > Once again, thanks for the help.
> > >> > > >
> > >> > > > Em qui., 30 de abr. de 2020 às 15:04, Liam Clarke-Hutchinson <
> > >> > > > liam.cla...@adscale.co.nz> escreveu:
> > >> > > >
> > >> > > > > I'd also suggest eyeballing your systemd conf to verify that
> > >> someone
> > >> > > > hasn't
> > >> > > > > set a very low TimeoutStopSec, or that
> > >> KillSignal/RestartKillSignal
> > >> > > > haven't
> > >> > > > > been configured to SIGKILL (confusingly named, imo, as the
> > default
> > >> > for
> > >> > > > > KillSignal is SIGTERM).
> > >> > > > >
> > >> > > > > Also, the Kafka broker logs at shutdown look very different if
> > it
> > >> > shut
> > >> > > > down
> > >> > > > > currently vs if it didn't. Could you perhaps put them in a
> Gist
> > >> and
> > >> > > email
> > >> > > > > the link?
> > >> > > > >
> > >> > > > > Just trying to make sure basic assumptions are holding :)
> > >> > > > >
> > >> > > > > On Fri, 1 May 2020, 1:21 am JP MB, <
> jose.brandao1...@gmail.com>
> > >> > wrote:
> > >> > > > >
> > >> > > > > > Hi,
> > >> > > > > > It's quite a complex script generated with ansible where we
> > use
> > >> a/b
> > >> > > > > > deployment and honestly, I don't have full knowledge on it I
> > can
> > >> > > share
> > >> > > > > the
> > >> > > > > > general guidelines of what is done:
> > >> > > > > >
> > >> > > > > > > - Any old volumes (from previous releases are removed)
> > (named
> > >> > with
> > >> > > > > suffix
> > >> > > > > > > '-old')
> > >> > > > > > > - Detach the volumes attached to the old host
> > >> > > > > > > - Stop the service in the old host - uses systemctl stop
> > kafka
> > >> > > > > > > - Attempt to create a CNAME volume: this is a volume with
> > the
> > >> > same
> > >> > > > name
> > >> > > > > > > that will be attached to the new box. Except for very
> first
> > >> run,
> > >> > > this
> > >> > > > > > task
> > >> > > > > > > is used to get the information about the existing volume.
> > (no
> > >> > > sufix)
> > >> > > > > > > - A new volume is created as copy of the CNAME volume
> (named
> > >> with
> > >> > > > > suffix
> > >> > > > > > > '-new')
> > >> > > > > > > - The new volume is attached to the host/vm (named with
> > suffix
> > >> > > > '-new')
> > >> > > > > > > - The new volume is formated (except for very first run,
> its
> > >> > > already
> > >> > > > > > > formated)(named with suffix '-new')
> > >> > > > > > > - The new volume is mounted (named with suffix '-new')
> > >> > > > > > > - Start the service in the new host - uses systemctl start
> > >> kafka
> > >> > > > > > > - If everthing went well stopping/starting services:
> > >> > > > > > >    - The volume no the old host is renamed with prefix
> > '-old'.
> > >> > > > > > >    - The new volume is renamed stripping the suffix
> '-new'.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > I made a new experiment today with some interesting
> findings.
> > >> Had
> > >> > 518
> > >> > > > > > messages in a given topic, after a deployment lost 9 due to
> > this
> > >> > > > problem
> > >> > > > > in
> > >> > > > > > partitions 13,15,16 and 17. All the errors I could find in
> the
> > >> time
> > >> > > > > > index files before the deployment (left is partition
> number):
> > >> > > > > >
> > >> > > > > > 11 -> timestamp mismatch on 685803 - offsets from 685801 to
> > >> 685805,
> > >> > > no
> > >> > > > > > > message loss here
> > >> > > > > > > 12 -> -1 error no indexes on the log - base segment was
> the
> > >> last
> > >> > > > offset
> > >> > > > > > so
> > >> > > > > > > ok
> > >> > > > > > > 13 -> timestamp mismatch error on 823168 - offsets from
> > >> 323168 to
> > >> > > > > 823172,
> > >> > > > > > > four messages lost
> > >> > > > > > > 14 -> timestamp mismatch on 619257 - offsets from 619253
> to
> > >> > 619258,
> > >> > > > no
> > >> > > > > > > message loss here
> > >> > > > > > > 15 -> timestamp mismatch on 658783 - offsets from 658783
> to
> > >> > 658784,
> > >> > > > one
> > >> > > > > > > message missing
> > >> > > > > > > 16 -> timestamp mismatch on 623508 - offsets from 623508
> to
> > >> > 623509,
> > >> > > > one
> > >> > > > > > > message missing
> > >> > > > > > > 17 -> timestamp mismatch on 515479 - offsets from 515479
> to
> > >> > 515481,
> > >> > > > two
> > >> > > > > > > messages missing
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > After the deployment, I took a look and the state was this:
> > >> > > > > >
> > >> > > > > > > 11 -> timestamp mismatch error on 685803 -   same state
> > >> > > > > > > 12 -> -1 error no indexes on the log - same state
> > >> > > > > > > 13 -> Exception in thread "main" java.io.IOException:
> > >> Permission
> > >> > > > denied
> > >> > > > > > > 14 -> timestamp mismatch error on 619257 - same state
> > >> > > > > > > 15 -> Exception in thread "main" java.io.IOException:
> > >> Permission
> > >> > > > denied
> > >> > > > > > > 16 -> Exception in thread "main" java.io.IOException:
> > >> Permission
> > >> > > > denied
> > >> > > > > > > 17 -> Exception in thread "main" java.io.IOException:
> > >> Permission
> > >> > > > denied
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Some conclusions at this point:
> > >> > > > > >
> > >> > > > > >    - We only lost messages where the initial offset had a
> > >> corrupted
> > >> > > > > >    .timeindex file, this is, the base offset for the
> segment.
> > >> > > > > >    - Immediately after the deployment, we were unable to
> open
> > >> all
> > >> > the
> > >> > > > > >    partitions where we lost messages: Permission denied.
> > >> > > > > >
> > >> > > > > > This was yesterday at the end of the day, today I checked
> the
> > >> > number
> > >> > > of
> > >> > > > > > messages and it was reduced from 509 to 493. Also, the state
> > of
> > >> the
> > >> > > > > > .timeindex files was changed:
> > >> > > > > >
> > >> > > > > > 11 -> Exception in thread "main" java.io.IOException:
> > Permission
> > >> > > > denied -
> > >> > > > > > > changed state
> > >> > > > > > > 12 -> -1 error no indexes on the log
> > >> > > > > > > 13 -> Exception in thread "main" java.io.IOException:
> > >> Permission
> > >> > > > denied
> > >> > > > > > > 14 -> Exception in thread "main" java.io.IOException:
> > >> Permission
> > >> > > > > denied -
> > >> > > > > > > changed state
> > >> > > > > > > 15 -> Exception in thread "main" java.io.IOException:
> > >> Permission
> > >> > > > denied
> > >> > > > > > > 16 -> Exception in thread "main" java.io.IOException:
> > >> Permission
> > >> > > > denied
> > >> > > > > > > 17 -> Exception in thread "main" java.io.IOException:
> > >> Permission
> > >> > > > denied
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > So partition 11 and 14 timeindex files were the ones with
> the
> > >> > > timestamp
> > >> > > > > > mismatch error that didn't lose messages immediately after
> the
> > >> > > > > deployment.
> > >> > > > > > After the deployment being done and after the cluster being
> > >> already
> > >> > > > > running
> > >> > > > > > both changed to permission denied and* all the messages
> inside
> > >> > those
> > >> > > > > > partitions(11 & 14) were gone. *So this didn't happened only
> > >> > > > immediately
> > >> > > > > > after the rolling deployment but actually also while the
> > cluster
> > >> > was
> > >> > > > > > running.
> > >> > > > > >
> > >> > > > > > I have manually restarted a broker with systemctl stop (took
> > 2/3
> > >> > > > > seconds) &
> > >> > > > > > systemctl start all those "permission denied" errors were
> > >> > transformed
> > >> > > > > into
> > >> > > > > > "-1 error no indexes on the log" looking like the files were
> > >> reset.
> > >> > > The
> > >> > > > > > other brokers still have permission denied.
> > >> > > > > >
> > >> > > > > > Does this sound anything to you? I don't really have an idea
> > of
> > >> > what
> > >> > > > > could
> > >> > > > > > be corrupting those index files.
> > >> > > > > >
> > >> > > > > > Next things I will check:
> > >> > > > > >
> > >> > > > > >    - When exactly those messages were deleted in partitions
> 11
> > >> and
> > >> > > 14.
> > >> > > > > >    - What happens if I have timeindex files with a
> "timestamp
> > >> > > mismatch
> > >> > > > > >    error" and manually start and stop a broker.
> > >> > > > > >
> > >> > > > > > Once again, thanks for the efforts on awnsering.
> > >> > > > > >
> > >> > > > > > Em qui., 30 de abr. de 2020 às 09:39, Goran Sliskovic
> > >> > > > > > <gslis...@yahoo.com.invalid> escreveu:
> > >> > > > > >
> > >> > > > > > >  Hi,
> > >> > > > > > > It does look as index corruption... Can you post script
> that
> > >> > stops
> > >> > > > > kafka?
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >     On Wednesday, April 29, 2020, 06:38:18 PM GMT+2, JP
> MB <
> > >> > > > > > > jose.brandao1...@gmail.com> wrote:
> > >> > > > > > >
> > >> > > > > > >  >
> > >> > > > > > > > Can you try using the console consumer to display
> > >> messages/keys
> > >> > > and
> > >> > > > > > > > timestamps ?
> > >> > > > > > > > --property print.key=true --property
> print.timestamp=true
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > There are a lot off messages so I'm picking an example
> > without
> > >> > and
> > >> > > > with
> > >> > > > > > > timeindex entry. All of them have a null key:
> > >> > > > > > >
> > >> > > > > > > Offset 57 CreateTime:1588074808027 Key:null  - no time
> index
> > >> > > > > > > Offset 144 CreateTime:1588157145655 Key:null - has time
> > index
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Hmm, how are you doing your rolling deploys?
> > >> > > > > > >
> > >> > > > > > > It's rollout deployment, we take one node down and spin up
> > >> > another
> > >> > > a
> > >> > > > > new
> > >> > > > > > > one. One at a time.
> > >> > > > > > >
> > >> > > > > > > I'm wondering if the time indexes are being corrupted by
> > >> unclean
> > >> > > > > > > > shutdowns. I've been reading code and the only path I
> > could
> > >> > find
> > >> > > > that
> > >> > > > > > led
> > >> > > > > > > > to a largest
> > >> > > > > > > > timestamp of 0 was, as you've discovered, where there
> was
> > no
> > >> > time
> > >> > > > > > index.
> > >> > > > > > > >  WRT to the corruption - the broker being SIGKILLed
> > >> (systemctl
> > >> > by
> > >> > > > > > default
> > >> > > > > > > > sends SIGKILL 90 seconds after SIGTERM, and our broker
> > >> needed
> > >> > > 120s
> > >> > > > to
> > >> > > > > > > shut
> > >> > > > > > > > down cleanly) has caused index corruption for us in the
> > >> past -
> > >> > > > > although
> > >> > > > > > > in
> > >> > > > > > > > our case it was recovered from automatically by the
> > broker.
> > >> > Just
> > >> > > > > took 2
> > >> > > > > > > > hours.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > This would be a perfect justification for it but we use
> > >> systemctl
> > >> > > > stop
> > >> > > > > > and
> > >> > > > > > > it takes around 4 seconds to shut down so I believe it
> ends
> > >> > > > gracefully
> > >> > > > > > > before SIGKILL?
> > >> > > > > > >
> > >> > > > > > > Also, are you moving between versions with these deploys?
> > >> > > > > > >
> > >> > > > > > > No, we have several clusters where this is happening. The
> > >> > > > information I
> > >> > > > > > > showed you is from a cluster with version 2.3 but with
> 10.2
> > >> for
> > >> > > inter
> > >> > > > > > > broker protocol communication and log format. We have also
> > >> > > > experienced
> > >> > > > > > this
> > >> > > > > > > in fully updated 2.4  and 2.4.1 clusters. But to sum, the
> > >> > > experiences
> > >> > > > > are
> > >> > > > > > > done always deploying (again) the version already there.
> > >> > > > > > >
> > >> > > > > > > Thanks all for the efforts so far.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > ...
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>
>
> --
> *Nicolas Carlot*
>
> Lead dev
> |  | nicolas.car...@chronopost.fr
>
>
> *Veuillez noter qu'à partir du 20 mai, le siège Chronopost déménage. La
> nouvelle adresse est : 3 boulevard Romain Rolland 75014 Paris*
>
> [image: Logo Chronopost]
> | chronopost.fr <http://www.chronopost.fr/>
> Suivez nous sur Facebook <https://fr-fr.facebook.com/chronopost> et
> Twitter
> <https://twitter.com/chronopost>.
>
> [image: DPD Group]
>

Re: Kafka: Messages disappearing from topics, largestTime=0

Reply via email to