Hi, What I can see from the configurations: > log.dir = /tmp/kafka-logs (default) > log.dirs = /var/kafkadata/data01/data
>From the documentation log.dir is only used if log.dirs is not set, so */var/kafkadata/data01/data *is the folder used for logs. Regards Em ter., 5 de mai. de 2020 às 08:52, Nicolas Carlot <nicolas.car...@chronopost.fr.invalid> escreveu: > Hi guys, still following your discussion even if it's out of my reach. > Just been noticing that you use /tmp/ for your logs, dunno if it's a good > idea :o https://issues.apache.org/jira/browse/KAFKA-3925 > > Le lun. 4 mai 2020 à 19:40, JP MB <jose.brandao1...@gmail.com> a écrit : > > > Here are the startup logs from a deployment where we lost 15 messages in > > topic-p: > > https://gist.github.com/josebrandao13/81271140e59e28eda7aaa777d2d3b02c > > > > .timeindex files state before the deployment: > > *Partitions with messages: timestamp mismatch > > *Partitions without messages: permission denied > > > > .timeindex files state before the deployment: > > *All partitions without messages: permission denied, new files were > > created. > > > > Don't see anything particular in the logs but you can see that the > messages > > are deleted with largestTime=0 and are from today morning. > > > > Em seg., 4 de mai. de 2020 às 11:37, JP MB <jose.brandao1...@gmail.com> > > escreveu: > > > > > Hi guys, > > > > > > I'm gonna get back to this today, I get mixed feelings regarding the > > > volumes being the cause. This volume switching is around for quite some > > > time, in a lot of clusters, and we only started noticing this problem > > when > > > we updated some of them. Also, this only happens in *a few* of those > > > .timeindex files and not in all of them. The .log files or .index files > > > which are also on the volumes don't also have the problem. > > > > > > Additionally, I'm a bit confused on what should be the initial state of > > > those .timeindex files. Sometimes I see "found log offset: -1", others > > the > > > timestamp mismatch error "Index timestamp: 0, log timestamp: > > 1588583643582" > > > and sometimes something like this "Indexed offset: 0, found log offset: > > 28". > > > > > > So we have seen previously that whenever the timestamp mismatch error > is > > > present we lose messages. eventually after a deployment. Since this > looks > > > like the trigger for the problem I would like to understand how it can > > > happen. So my question is, how can each of those different states of > > > initialization for the .timeindexes happen? We can reproduce all of > them > > > when running Kafka locally. > > > > > > Meanwhile, I'm trying to reproduce the situation in our dev > environments > > > and get some startup logs and I will play with the log flush settings. > > > > > > Regards > > > > > > Em sáb., 2 de mai. de 2020 às 14:45, Liam Clarke-Hutchinson < > > > liam.cla...@adscale.co.nz> escreveu: > > > > > >> Good luck JP, do try it with the volume switching commented out, and > see > > >> how it goes. > > >> > > >> On Fri, May 1, 2020 at 6:50 PM JP MB <jose.brandao1...@gmail.com> > > wrote: > > >> > > >> > Thank you very much for the help anyway. > > >> > > > >> > Best regards > > >> > > > >> > On Fri, May 1, 2020, 00:54 Liam Clarke-Hutchinson < > > >> > liam.cla...@adscale.co.nz> > > >> > wrote: > > >> > > > >> > > So the logs show a healthy shutdown, so we can eliminate that as > an > > >> > issue. > > >> > > I would look next at the volume management during a rollout based > on > > >> the > > >> > > other error messages you had earlier about permission denied etc. > > It's > > >> > > possible there's some journalled but not flushed changes in those > > time > > >> > > indexes, but at this point we're getting into filesystem internals > > >> which > > >> > > aren't my forte. But if you can temporarily disable the volume > > >> switching > > >> > > and do a test roll out, see if you get the same problems or not, > > would > > >> > help > > >> > > eliminate it or confirm it. > > >> > > > > >> > > Sorry I can't help further on that. > > >> > > > > >> > > On Fri, May 1, 2020 at 5:34 AM JP MB <jose.brandao1...@gmail.com> > > >> wrote: > > >> > > > > >> > > > I took a bit because I needed logs of the server shutting down > > when > > >> > this > > >> > > > occurs. Here they are, I can see some errors: > > >> > > > > > >> > https://gist.github.com/josebrandao13/e8b82469d3e9ad91fbf38cf139b5a726 > > >> > > > > > >> > > > Regarding systemd, the closest I could find to TimeoutStopSec > was > > >> > > > DefaultTimeoutStopUSec=1min 30s that looks to be 90seconds. I > > could > > >> not > > >> > > > find any KillSignal or RestartKillSignal. You can see the output > > of > > >> > > > systemctl show --all here: > > >> > > > > > >> > https://gist.github.com/josebrandao13/f2dd646fab19b19f127981fce92d78c4 > > >> > > > > > >> > > > Once again, thanks for the help. > > >> > > > > > >> > > > Em qui., 30 de abr. de 2020 às 15:04, Liam Clarke-Hutchinson < > > >> > > > liam.cla...@adscale.co.nz> escreveu: > > >> > > > > > >> > > > > I'd also suggest eyeballing your systemd conf to verify that > > >> someone > > >> > > > hasn't > > >> > > > > set a very low TimeoutStopSec, or that > > >> KillSignal/RestartKillSignal > > >> > > > haven't > > >> > > > > been configured to SIGKILL (confusingly named, imo, as the > > default > > >> > for > > >> > > > > KillSignal is SIGTERM). > > >> > > > > > > >> > > > > Also, the Kafka broker logs at shutdown look very different if > > it > > >> > shut > > >> > > > down > > >> > > > > currently vs if it didn't. Could you perhaps put them in a > Gist > > >> and > > >> > > email > > >> > > > > the link? > > >> > > > > > > >> > > > > Just trying to make sure basic assumptions are holding :) > > >> > > > > > > >> > > > > On Fri, 1 May 2020, 1:21 am JP MB, < > jose.brandao1...@gmail.com> > > >> > wrote: > > >> > > > > > > >> > > > > > Hi, > > >> > > > > > It's quite a complex script generated with ansible where we > > use > > >> a/b > > >> > > > > > deployment and honestly, I don't have full knowledge on it I > > can > > >> > > share > > >> > > > > the > > >> > > > > > general guidelines of what is done: > > >> > > > > > > > >> > > > > > > - Any old volumes (from previous releases are removed) > > (named > > >> > with > > >> > > > > suffix > > >> > > > > > > '-old') > > >> > > > > > > - Detach the volumes attached to the old host > > >> > > > > > > - Stop the service in the old host - uses systemctl stop > > kafka > > >> > > > > > > - Attempt to create a CNAME volume: this is a volume with > > the > > >> > same > > >> > > > name > > >> > > > > > > that will be attached to the new box. Except for very > first > > >> run, > > >> > > this > > >> > > > > > task > > >> > > > > > > is used to get the information about the existing volume. > > (no > > >> > > sufix) > > >> > > > > > > - A new volume is created as copy of the CNAME volume > (named > > >> with > > >> > > > > suffix > > >> > > > > > > '-new') > > >> > > > > > > - The new volume is attached to the host/vm (named with > > suffix > > >> > > > '-new') > > >> > > > > > > - The new volume is formated (except for very first run, > its > > >> > > already > > >> > > > > > > formated)(named with suffix '-new') > > >> > > > > > > - The new volume is mounted (named with suffix '-new') > > >> > > > > > > - Start the service in the new host - uses systemctl start > > >> kafka > > >> > > > > > > - If everthing went well stopping/starting services: > > >> > > > > > > - The volume no the old host is renamed with prefix > > '-old'. > > >> > > > > > > - The new volume is renamed stripping the suffix > '-new'. > > >> > > > > > > > >> > > > > > > > >> > > > > > I made a new experiment today with some interesting > findings. > > >> Had > > >> > 518 > > >> > > > > > messages in a given topic, after a deployment lost 9 due to > > this > > >> > > > problem > > >> > > > > in > > >> > > > > > partitions 13,15,16 and 17. All the errors I could find in > the > > >> time > > >> > > > > > index files before the deployment (left is partition > number): > > >> > > > > > > > >> > > > > > 11 -> timestamp mismatch on 685803 - offsets from 685801 to > > >> 685805, > > >> > > no > > >> > > > > > > message loss here > > >> > > > > > > 12 -> -1 error no indexes on the log - base segment was > the > > >> last > > >> > > > offset > > >> > > > > > so > > >> > > > > > > ok > > >> > > > > > > 13 -> timestamp mismatch error on 823168 - offsets from > > >> 323168 to > > >> > > > > 823172, > > >> > > > > > > four messages lost > > >> > > > > > > 14 -> timestamp mismatch on 619257 - offsets from 619253 > to > > >> > 619258, > > >> > > > no > > >> > > > > > > message loss here > > >> > > > > > > 15 -> timestamp mismatch on 658783 - offsets from 658783 > to > > >> > 658784, > > >> > > > one > > >> > > > > > > message missing > > >> > > > > > > 16 -> timestamp mismatch on 623508 - offsets from 623508 > to > > >> > 623509, > > >> > > > one > > >> > > > > > > message missing > > >> > > > > > > 17 -> timestamp mismatch on 515479 - offsets from 515479 > to > > >> > 515481, > > >> > > > two > > >> > > > > > > messages missing > > >> > > > > > > > >> > > > > > > > >> > > > > > After the deployment, I took a look and the state was this: > > >> > > > > > > > >> > > > > > > 11 -> timestamp mismatch error on 685803 - same state > > >> > > > > > > 12 -> -1 error no indexes on the log - same state > > >> > > > > > > 13 -> Exception in thread "main" java.io.IOException: > > >> Permission > > >> > > > denied > > >> > > > > > > 14 -> timestamp mismatch error on 619257 - same state > > >> > > > > > > 15 -> Exception in thread "main" java.io.IOException: > > >> Permission > > >> > > > denied > > >> > > > > > > 16 -> Exception in thread "main" java.io.IOException: > > >> Permission > > >> > > > denied > > >> > > > > > > 17 -> Exception in thread "main" java.io.IOException: > > >> Permission > > >> > > > denied > > >> > > > > > > > >> > > > > > > > >> > > > > > Some conclusions at this point: > > >> > > > > > > > >> > > > > > - We only lost messages where the initial offset had a > > >> corrupted > > >> > > > > > .timeindex file, this is, the base offset for the > segment. > > >> > > > > > - Immediately after the deployment, we were unable to > open > > >> all > > >> > the > > >> > > > > > partitions where we lost messages: Permission denied. > > >> > > > > > > > >> > > > > > This was yesterday at the end of the day, today I checked > the > > >> > number > > >> > > of > > >> > > > > > messages and it was reduced from 509 to 493. Also, the state > > of > > >> the > > >> > > > > > .timeindex files was changed: > > >> > > > > > > > >> > > > > > 11 -> Exception in thread "main" java.io.IOException: > > Permission > > >> > > > denied - > > >> > > > > > > changed state > > >> > > > > > > 12 -> -1 error no indexes on the log > > >> > > > > > > 13 -> Exception in thread "main" java.io.IOException: > > >> Permission > > >> > > > denied > > >> > > > > > > 14 -> Exception in thread "main" java.io.IOException: > > >> Permission > > >> > > > > denied - > > >> > > > > > > changed state > > >> > > > > > > 15 -> Exception in thread "main" java.io.IOException: > > >> Permission > > >> > > > denied > > >> > > > > > > 16 -> Exception in thread "main" java.io.IOException: > > >> Permission > > >> > > > denied > > >> > > > > > > 17 -> Exception in thread "main" java.io.IOException: > > >> Permission > > >> > > > denied > > >> > > > > > > > >> > > > > > > > >> > > > > > So partition 11 and 14 timeindex files were the ones with > the > > >> > > timestamp > > >> > > > > > mismatch error that didn't lose messages immediately after > the > > >> > > > > deployment. > > >> > > > > > After the deployment being done and after the cluster being > > >> already > > >> > > > > running > > >> > > > > > both changed to permission denied and* all the messages > inside > > >> > those > > >> > > > > > partitions(11 & 14) were gone. *So this didn't happened only > > >> > > > immediately > > >> > > > > > after the rolling deployment but actually also while the > > cluster > > >> > was > > >> > > > > > running. > > >> > > > > > > > >> > > > > > I have manually restarted a broker with systemctl stop (took > > 2/3 > > >> > > > > seconds) & > > >> > > > > > systemctl start all those "permission denied" errors were > > >> > transformed > > >> > > > > into > > >> > > > > > "-1 error no indexes on the log" looking like the files were > > >> reset. > > >> > > The > > >> > > > > > other brokers still have permission denied. > > >> > > > > > > > >> > > > > > Does this sound anything to you? I don't really have an idea > > of > > >> > what > > >> > > > > could > > >> > > > > > be corrupting those index files. > > >> > > > > > > > >> > > > > > Next things I will check: > > >> > > > > > > > >> > > > > > - When exactly those messages were deleted in partitions > 11 > > >> and > > >> > > 14. > > >> > > > > > - What happens if I have timeindex files with a > "timestamp > > >> > > mismatch > > >> > > > > > error" and manually start and stop a broker. > > >> > > > > > > > >> > > > > > Once again, thanks for the efforts on awnsering. > > >> > > > > > > > >> > > > > > Em qui., 30 de abr. de 2020 às 09:39, Goran Sliskovic > > >> > > > > > <gslis...@yahoo.com.invalid> escreveu: > > >> > > > > > > > >> > > > > > > Hi, > > >> > > > > > > It does look as index corruption... Can you post script > that > > >> > stops > > >> > > > > kafka? > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > On Wednesday, April 29, 2020, 06:38:18 PM GMT+2, JP > MB < > > >> > > > > > > jose.brandao1...@gmail.com> wrote: > > >> > > > > > > > > >> > > > > > > > > > >> > > > > > > > Can you try using the console consumer to display > > >> messages/keys > > >> > > and > > >> > > > > > > > timestamps ? > > >> > > > > > > > --property print.key=true --property > print.timestamp=true > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > There are a lot off messages so I'm picking an example > > without > > >> > and > > >> > > > with > > >> > > > > > > timeindex entry. All of them have a null key: > > >> > > > > > > > > >> > > > > > > Offset 57 CreateTime:1588074808027 Key:null - no time > index > > >> > > > > > > Offset 144 CreateTime:1588157145655 Key:null - has time > > index > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > Hmm, how are you doing your rolling deploys? > > >> > > > > > > > > >> > > > > > > It's rollout deployment, we take one node down and spin up > > >> > another > > >> > > a > > >> > > > > new > > >> > > > > > > one. One at a time. > > >> > > > > > > > > >> > > > > > > I'm wondering if the time indexes are being corrupted by > > >> unclean > > >> > > > > > > > shutdowns. I've been reading code and the only path I > > could > > >> > find > > >> > > > that > > >> > > > > > led > > >> > > > > > > > to a largest > > >> > > > > > > > timestamp of 0 was, as you've discovered, where there > was > > no > > >> > time > > >> > > > > > index. > > >> > > > > > > > WRT to the corruption - the broker being SIGKILLed > > >> (systemctl > > >> > by > > >> > > > > > default > > >> > > > > > > > sends SIGKILL 90 seconds after SIGTERM, and our broker > > >> needed > > >> > > 120s > > >> > > > to > > >> > > > > > > shut > > >> > > > > > > > down cleanly) has caused index corruption for us in the > > >> past - > > >> > > > > although > > >> > > > > > > in > > >> > > > > > > > our case it was recovered from automatically by the > > broker. > > >> > Just > > >> > > > > took 2 > > >> > > > > > > > hours. > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > This would be a perfect justification for it but we use > > >> systemctl > > >> > > > stop > > >> > > > > > and > > >> > > > > > > it takes around 4 seconds to shut down so I believe it > ends > > >> > > > gracefully > > >> > > > > > > before SIGKILL? > > >> > > > > > > > > >> > > > > > > Also, are you moving between versions with these deploys? > > >> > > > > > > > > >> > > > > > > No, we have several clusters where this is happening. The > > >> > > > information I > > >> > > > > > > showed you is from a cluster with version 2.3 but with > 10.2 > > >> for > > >> > > inter > > >> > > > > > > broker protocol communication and log format. We have also > > >> > > > experienced > > >> > > > > > this > > >> > > > > > > in fully updated 2.4 and 2.4.1 clusters. But to sum, the > > >> > > experiences > > >> > > > > are > > >> > > > > > > done always deploying (again) the version already there. > > >> > > > > > > > > >> > > > > > > Thanks all for the efforts so far. > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > ... > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > -- > *Nicolas Carlot* > > Lead dev > | | nicolas.car...@chronopost.fr > > > *Veuillez noter qu'à partir du 20 mai, le siège Chronopost déménage. La > nouvelle adresse est : 3 boulevard Romain Rolland 75014 Paris* > > [image: Logo Chronopost] > | chronopost.fr <http://www.chronopost.fr/> > Suivez nous sur Facebook <https://fr-fr.facebook.com/chronopost> et > Twitter > <https://twitter.com/chronopost>. > > [image: DPD Group] >