Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-15 Thread Vincent Rischmann
Hello again, I had the same problem again today. While stopping the broker it crashed, then after upgrading to 0.11.0.2 and restarting the broker it's again taking a really long time to recover. It's been almost 3 hours now and it's not done. I restarted the previous broker which crashed, but s

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-15 Thread Ismael Juma
Hi James, There was a bug in 0.11.0.0 that could cause all segments to be scanned during a restart. I believe that was fixed in subsequent 0.11.0.x releases. Ismael On Fri, Jan 12, 2018 at 6:49 AM, James Cheng wrote: > We saw this as well, when updating from 0.10.1.1 to 0.11.0.1. > > Have you

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-11 Thread James Cheng
We saw this as well, when updating from 0.10.1.1 to 0.11.0.1. Have you restarted your brokers since then? Did it take 8h to start up again, or did it take its normal 45 minutes? I don't think it's related to the crash/recovery. Rather, I think it's due to the upgrade from 0.10.1.1 to 0.11.0.1

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-11 Thread Vincent Rischmann
If anyone else has any idea, I'd love to hear it. Meanwhile, I'll resume upgrading my brokers and hope it doesn't crash and/or take so much time for recovery. On Sat, Jan 6, 2018, at 7:25 PM, Vincent Rischmann wrote: > Hi, > > just to clarify: this is the cause of the crash > https://pastebin.

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-06 Thread Vincent Rischmann
Hi, just to clarify: this is the cause of the crash https://pastebin.com/GuF60kvF in the broker logs, which is why I referenced https://issues.apache.org/jira/browse/KAFKA-4523 I had this crash some time ago and yesterday was in the process of upgrading my brokers to 0.11.0.2 in part to addres

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-06 Thread Ted Yu
Ismael: We're on the same page. 0.11.0.2 was released on 17 Nov 2017. By 'recently' in my previous email I meant the change was newer. Vincent: Did the machine your broker ran on experience power issue ? Cheers On Sat, Jan 6, 2018 at 7:36 AM, Ismael Juma wrote: > Hi Ted, > > The change you m

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-06 Thread Ismael Juma
Hi Ted, The change you mention is not part of 0.11.0.2. Ismael On Sat, Jan 6, 2018 at 3:31 PM, Ted Yu wrote: > bq. WARN Found a corrupted index file due to requirement failed: Corrupt > index found, index file > (/data/kafka/data-processed-15/54942918.index) > > Can you search back

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-06 Thread Ted Yu
bq. WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/data/kafka/data-processed-15/54942918.index) Can you search backward for 54942918.index in the log to see if we can find the cause for corruption ? This part of code was rece

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-06 Thread Matt Farmer
This is “normal” as far as I know. We’ve seen this behavior after unclean shutdowns of 0.10.1.1. In the event of an unclean shutdown Kafka seems to have to rebuild some indexes and for large data directories this takes some time. We got bit by this a few times recently when we had boxes that po

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-06 Thread Vincent Rischmann
Here's an excerpt just after the broker started: https://pastebin.com/tZqze4Ya After more than 8 hours of recovery the broker finally started. I haven't read through all 8 hours of log but the parts I looked at are like the pastebin. I'm not seeing much in the log cleaner logs either, they look

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-05 Thread Brett Rann
What do the broker logs say its doing during all that time? There are some consumer offset / log cleaner bugs which caused us similarly log delays. that was easily visible by watching the log cleaner activity in the logs, and in our monitoring of partition sizes watching them go down, along with I

Insanely long recovery time with Kafka 0.11.0.2

2018-01-05 Thread Vincent Rischmann
Hello, so I'm upgrading my brokers from 0.10.1.1 to 0.11.0.2 to fix this bug https://issues.apache.org/jira/browse/KAFKA-4523 Unfortunately while stopping one broker, it crashed exactly because of this bug. No big deal usually, except after restarting Kafka in 0.11.0.2 the recovery is taking a rea