Re: Halting because log truncation is not allowed for topic __consumer_offsets

2016-12-21 Thread Jun MA
Hi Peter, We’ve seen this happen under normal operation in our virtualized environment as well. Our network is not very stable, blips happen pretty frequently. Your explanation sounds reasonable to me, I’m very interested in your further thought on this. In our case, we’re using quorum based r

Re: Halting because log truncation is not allowed for topic __consumer_offsets

2016-12-20 Thread Jun MA
Hi B, Thanks for your reply. To clarify, in our case, it should be the leader which crash or something goes wrong to the disk, not the followers (2 brokers that show’s the FATAL error)? > On Dec 19, 2016, at 7:46 PM, Ben Stopford wrote: > > Hi Jun > > This should only be possible in situati

Re: Halting because log truncation is not allowed for topic __consumer_offsets

2016-12-19 Thread Ben Stopford
Hi Jun This should only be possible in situations where there is a crash or something happens to the underlying disks (assuming clean leader election). I've not come across others. The assumption, as I understand it, is that the underlying issue stems from KAFKA-1211

Re: Halting because log truncation is not allowed for topic __consumer_offsets

2016-06-26 Thread Peter Davis
Thanks James. Has anyone else seen this happen under normal operation? So far I have not thought of how to reliably recreate the issue under normal(ish) circumstances. Haven't even been able to prove yet the true nature of the network issues. Only evidence is that it happened 3 times last week in

Re: Halting because log truncation is not allowed for topic __consumer_offsets

2016-06-26 Thread James Cheng
Peter, can you add some of your observations to those JIRAs? You seem to have a good understanding of the problem. Maybe there is something that can be improved in the codebase to prevent this from happening, or reduce the impact of it. Wanny, you might want to add a "me too" to the JIRAs as we

Re: Halting because log truncation is not allowed for topic __consumer_offsets

2016-06-26 Thread Peter Davis
We have seen this several times and it's quite frustrating. It seems to happen due to the fact that the leader for a partition writes to followers ahead of committing itself, especially for a topic like __consumer_offsets that is written with acks=all. If a brief network interruption occurs (a