Hey James, Ted, @James - Thanks for showing me some of the changes, that was informative.
* *Log Cleaner Thread Revival* - I also acknowledge that could be useful. My concern is that if the thread has died, there is most likely something wrong with either the disk or the software and since both are deterministic (correct me if I'm wrong), we will most likely hit it very soon again. I am not sure that scenario would be any good, but I am also not sure if it would hurt. Could it waste a significant amount of CPU from dying and running again? * *Partition Re-clean* - Hmm, maybe some sort of retry mechanism could be worth exploring. I'd like to hear other people's opinion on this and whether or not they've seen such scenarios before diving into possible implementation. * *Metric* - Could you point me to the some resources showing how the JMX metrics should be structured? I could not found any and am sadly not too knowledgeable on the topic * *uncleanable-partitions* *metric* - Yes, that might be problematic. Maybe the format Ted suggested would be best - "topic1-0,1,2". Then again, I fear we might still run out of characters. I am not sure how to best approach this yet. * *Disk Problems* - I am aware that the 4 JIRAs are not related to disk problems. I think this KIP brings the most value to exactly such scenarios - ones where the disk is OK. But then again, I thought I'd suggest failing the disk after a certain number of errors on it since it makes sense to me. I do not have a strong opinion about this, though. Now that you mentioned that this actually increases the blast radius - I tend to agree. Maybe we should scrap this behavior. Best, Stanislav On Tue, Jul 24, 2018 at 6:13 AM Ted Yu <yuzhih...@gmail.com> wrote: > As James pointed out in his reply, topic-partition name can be long. > It is not necessary to repeat the topic name for each of its partitions. > How about the following format: > > topic-name1-{partition1, partition2, etc} > > That is, topic name only appears once. > > Cheers > > On Mon, Jul 23, 2018 at 9:08 PM Stanislav Kozlovski < > stanis...@confluent.io> > wrote: > > > Hi Ted, > > > > Yes, absolutely. Thanks for pointing that out! > > > > On Mon, Jul 23, 2018 at 6:12 PM Ted Yu <yuzhih...@gmail.com> wrote: > > > > > For `uncleanable-partitions`, should the example include topic name(s) > ? > > > > > > Cheers > > > > > > On Mon, Jul 23, 2018 at 5:46 PM Stanislav Kozlovski < > > > stanis...@confluent.io> > > > wrote: > > > > > > > I renamed the KIP and that changed the link. Sorry about that. Here > is > > > the > > > > new link: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-346+-+Improve+LogCleaner+behavior+on+error > > > > > > > > On Mon, Jul 23, 2018 at 5:11 PM Stanislav Kozlovski < > > > > stanis...@confluent.io> > > > > wrote: > > > > > > > > > Hey group, > > > > > > > > > > I created a new KIP about making log compaction more > fault-tolerant. > > > > > Please give it a look here and please share what you think, > > especially > > > in > > > > > regards to the points in the "Needs Discussion" paragraph. > > > > > > > > > > KIP: KIP-346 > > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-346+-+Limit+blast+radius+of+log+compaction+failure > > > > > > > > > > -- > > > > > Best, > > > > > Stanislav > > > > > > > > > > > > > > > > > -- > > > > Best, > > > > Stanislav > > > > > > > > > > > > > -- > > Best, > > Stanislav > > > -- Best, Stanislav