Re: [VOTE] KIP-112 - Handle disk failure for JBOD

Dong Lin Thu, 27 Apr 2017 16:13:17 -0700

Thanks for the vote Jun!

I think that statement is probably OK because it assumes that broker has
bad log directories. If all log directories are good, the replica should be
created in one of the good log directories. It is clarified in the wiki
that "Even if isNewReplica=false and replica is not found on any log
directory, broker will still create replica on a good log directory if
there is no bad log directory.".



On Thu, Apr 27, 2017 at 4:07 PM, Jun Rao <[email protected]> wrote:

> Hi, Dong,
>
> Thanks for the proposal. +1. Just one minor comment.
>
> in "3. Broker bootstraps with bad log directories", when a broker receives
> a LeaderAndIsrRequest with isNewReplica=False but not found on any good log
> directory, if all log directories are good, it seems that we should create
> the replica in one of the good log directories? This can happen if a
> replica is manually deleted from the log directory.
>
> Jun
>
> On Wed, Apr 26, 2017 at 11:27 AM, Dong Lin <[email protected]> wrote:
>
> > Thanks for the vote!
> >
> > Discussed with Joel offline. I have updated the KIP to specify that
> > controller will consider a replica to be offline if KafkaStorageException
> > is specified for the replica in the LeaderAndIsrResponse. The other two
> > improvements may be done in the future KIP.
> >
> >
> >
> > On Wed, Apr 26, 2017 at 10:30 AM, Joel Koshy <[email protected]>
> wrote:
> >
> > > +1
> > >
> > > Discussed a few edits/improvements with Dong.
> > >
> > > - Rather than a blanket (Error != None) condition for detecting offline
> > > replicas you probably want a storage exception-specific error code.
> > >
> > > - Definitely in favor of improvement #7 and it shouldn’t be too hard to
> > do.
> > > When bouncing with a log directory on a faulty disk, the condition may
> be
> > > detected while loading logs and you may not have the full list of local
> > > replicas. So a subsequent L&ISR request would recreate the replica on
> the
> > > good disks (which may or may not be what the user wants).
> > >
> > > - Another improvement worth investigating is how best to support
> > partition
> > > reassignments even with a bad disk. The wiki hints that this is
> > unnecessary
> > > because reassignments being disallowed with an offline replica is
> similar
> > > to the current state of handling an offline broker. With JBOD though
> the
> > > broker with a bad disk does not have to be offline anymore so it should
> > be
> > > possible to support reassignments even with offline replicas. I'm not
> > > suggesting this is trivial, but would better leverage JBOD.
> > >
> > > On Wed, Apr 5, 2017 at 5:46 PM, Becket Qin <[email protected]>
> wrote:
> > >
> > > > +1
> > > >
> > > > Thanks for the KIP. Made a pass and had some minor change.
> > > >
> > > > On Mon, Apr 3, 2017 at 3:16 PM, radai <[email protected]>
> > > wrote:
> > > >
> > > > > +1, LGTM
> > > > >
> > > > > On Mon, Apr 3, 2017 at 9:49 AM, Dong Lin <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > It seems that there is no further concern with the KIP-112. We
> > would
> > > > like
> > > > > > to start the voting process. The KIP can be found at
> > > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 112%3A+Handle+disk+failure+for+JBOD
> > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 112%3A+Handle+disk+failure+for+JBOD>.*
> > > > > >
> > > > > > Thanks,
> > > > > > Dong
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-112 - Handle disk failure for JBOD

Reply via email to