Re: [DISCUSS] KIP-263: Allow broker to skip sanity check of inactive segments on broker startup

Dong Lin Sat, 03 Mar 2018 17:43:47 -0800

Hey Colin,

It seems that when broker recovers a segment, all snapshots after this
segment will be deleted. It means that the active segment can not be used
for transaction related operation if a non-active segment needs to be
recovered. Thus the suggested approach may not be safe if users need to use
transaction in Kafka. This is the main reason the config is added to turn
on/off this feature. It will be great if we can find a way to make this
work also for transaction.

Another concern, which will likely not happen in practice, is that if many
inactive log segments are corrupted but all active segments are not
corrupted, then the broker will first successfully start and become leaders
for many partitions later. And if right after broker becomes leader for
those partitions, there are many bootstrap consumer to consume from these
partitions whose inactive segments are corrupted, all broker's request
handler thread will be blocked waiting for segments recovery. Broker will
become unavailable for these partitions.

The latter is not a new problem in Kafka -- as of today if there is
problematic disk on the broker that causes broker to be really slow in
writing to disk, broker may also become almost unavailable because most
request handler thread will be blocked on slow disk IO. In general we need
a way for broker to voluntarily give up leadership of its partition. But
before this feature is available, the changes suggested in this KIP may
make this more likely a problem -- though really rarely.

Thanks,
Dong

On Fri, Mar 2, 2018 at 9:56 PM, Colin McCabe <cmcc...@apache.org> wrote:

> Hi Dong,
>
> This seems like a nice improvement.  Is there any way we could avoid
> adding a new configuration value?
>
> It's not clear to me why we would want the old behavior.
>
> best,
> Colin
>
>
> On Tue, Feb 27, 2018, at 23:57, Stephane Maarek wrote:
> > This is great and definitely needed. I'm not exactly sure of what goes in
> > the process of checking log files at startup, but is there something like
> > signature checks of files (especially closed, immutable ones) that can be
> > saved on disk and checked against at startup ? Wouldn't that help speed
> up
> > boot time, for all segments ?
> >
> > On 26 Feb. 2018 5:28 pm, "Dong Lin" <lindon...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I have created KIP-263: Allow broker to skip sanity check of inactive
> > > segments on broker startup. See
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 263%3A+Allow+broker+to+skip+sanity+check+of+inactive+
> > > segments+on+broker+startup
> > > .
> > >
> > > This KIP provides a way to significantly reduce time to rolling bounce
> a
> > > Kafka cluster.
> > >
> > > Comments are welcome!
> > >
> > > Thanks,
> > > Dong
> > >
>

Re: [DISCUSS] KIP-263: Allow broker to skip sanity check of inactive segments on broker startup

Reply via email to