Re: Replication Worker and targetBookie.

Venkateswara Rao Jujjuri Thu, 08 Mar 2018 11:46:47 -0800

On Thu, Mar 8, 2018 at 11:33 AM, Sijie Guo <guosi...@gmail.com> wrote:


> On Thu, Mar 8, 2018 at 8:07 AM, Venkateswara Rao Jujjuri <
> jujj...@gmail.com>
> wrote:
>
> > On Thu, Mar 8, 2018 at 2:38 AM, Ivan Kelly <iv...@apache.org> wrote:
> >
> > > > Given that RackAwareEnsemble policy defaults to finding a replacement
> > > > bookie within
> > > > the same rack, when a bookie is lost in a rack, the entire cluster
> will
> > > be
> > > > replicating
> > > > to the same 'rack'. This puts a lot of pressure on the rack and also
> > > takes
> > > > a longer time
> > > > to bring up the replication levels.
> > >
> > > I agree this has potential to be problematic.
> > >
> > > Perhaps we should provide a switch to RackAwareEnsemble,
> > > 'preferReplaceInSameRack'.
> > >
> > > > I would think the right fix is to bring back the targetBookie concept
> > > (with
> > > > a configuration parameter) and add placement check predicate on top
> of
> > > it.
> > > > When this is configured
> > > > each bookie picks up the work,  checks if the ensemble placement
> policy
> > > > gets satisfied,
> > > > if so replicate it, if not move on.
> > >
> > > I don't think adding a predicate argument (I guess a
> > > BiPredicate<Set<BookieSocketAddress>, BookieSocketAddress>?) to the
> > > recover bookie call makes sense. There is already a way to customize
> > > this behaviour, by passing in a EnsemblePlacementPolicy on
> > > Configuration of the client. The behaviour you want can be achieved by
> > > taking one of the current EnsemblePlacementPolicies and overriding
> > > replaceBookie, though I guess that's not very user-friendly. However,
> > > even if it was user-friendly, how would we make it easy for users to
> > > supply a placementpolicy or a even a predicate, as you suggested, to
> > > the autorecovery daemon.
> > >
> >
> > In the old model if the bookie is writable
> > AND is not part of ensemble, replicate to the local(target) bookie.
> > My proposal is t add anotehr AND condition.
> >
> > if bookie is writable AND not part of ensemble AND satisfies Enseble
> > Placement Policy
> > write to local(target) bookie.
> >
>
>
> I think this is a good change to take. but we need to differentiate things:
>
> 1) if bookie recovery is running a separate daemon, we don't need to do any
> changes here.
>

Why do you say so?

replaceBookie() -> selectFromNetworkLocation always tries to replace

the bookie from "bookieToReplace" rack. Goes to random/Anywhere from

the cluster on BKNotEnoughBookiesException.

This has two issues.

1. One rack gets the entire responsibility (write) to replace lost bookie.

2. If no space on this rack, we are going random on the cluster not
honoring the placmeent (This is a generic problem)



> 2) if bookie recovery is running along with the bookie, we construct a
> ensemble placement policy which wrap over the rack-aware/region-aware one
> and override the predicate to take local bookie into account.
>
>
> However there is a problem which this "predicate" model, because it can
> potentially churn the metadata storage, since now some bookies are actually
> doing nothing but polling the ur replication ledger list. so a long term
> direction is to change how auditor distributes replication tasks to
> replication workers.
>

Generally it is a watch on the underreplicated znode right?

JV

>
>
>
> >
> > Thanks,
> > JV
> >
> >
> > > -Ivan
> > >
> >
> >
> >
> > --
> > Jvrao
> > ---
> > First they ignore you, then they laugh at you, then they fight you, then
> > you win. - Mahatma Gandhi
> >
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Re: Replication Worker and targetBookie.

Reply via email to