Hey Colin, It seems that when broker recovers a segment, all snapshots after this segment will be deleted. It means that the active segment can not be used for transaction related operation if a non-active segment needs to be recovered. Thus the suggested approach may not be safe if users need to use transaction in Kafka. This is the main reason the config is added to turn on/off this feature. It will be great if we can find a way to make this work also for transaction.
Another concern, which will likely not happen in practice, is that if many inactive log segments are corrupted but all active segments are not corrupted, then the broker will first successfully start and become leaders for many partitions later. And if right after broker becomes leader for those partitions, there are many bootstrap consumer to consume from these partitions whose inactive segments are corrupted, all broker's request handler thread will be blocked waiting for segments recovery. Broker will become unavailable for these partitions. The latter is not a new problem in Kafka -- as of today if there is problematic disk on the broker that causes broker to be really slow in writing to disk, broker may also become almost unavailable because most request handler thread will be blocked on slow disk IO. In general we need a way for broker to voluntarily give up leadership of its partition. But before this feature is available, the changes suggested in this KIP may make this more likely a problem -- though really rarely. Thanks, Dong On Fri, Mar 2, 2018 at 9:56 PM, Colin McCabe <cmcc...@apache.org> wrote: > Hi Dong, > > This seems like a nice improvement. Is there any way we could avoid > adding a new configuration value? > > It's not clear to me why we would want the old behavior. > > best, > Colin > > > On Tue, Feb 27, 2018, at 23:57, Stephane Maarek wrote: > > This is great and definitely needed. I'm not exactly sure of what goes in > > the process of checking log files at startup, but is there something like > > signature checks of files (especially closed, immutable ones) that can be > > saved on disk and checked against at startup ? Wouldn't that help speed > up > > boot time, for all segments ? > > > > On 26 Feb. 2018 5:28 pm, "Dong Lin" <lindon...@gmail.com> wrote: > > > > > Hi all, > > > > > > I have created KIP-263: Allow broker to skip sanity check of inactive > > > segments on broker startup. See > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 263%3A+Allow+broker+to+skip+sanity+check+of+inactive+ > > > segments+on+broker+startup > > > . > > > > > > This KIP provides a way to significantly reduce time to rolling bounce > a > > > Kafka cluster. > > > > > > Comments are welcome! > > > > > > Thanks, > > > Dong > > > >