On Thu, Mar 8, 2018 at 11:46 AM, Venkateswara Rao Jujjuri <jujj...@gmail.com
> wrote:

> On Thu, Mar 8, 2018 at 11:33 AM, Sijie Guo <guosi...@gmail.com> wrote:
>
> > On Thu, Mar 8, 2018 at 8:07 AM, Venkateswara Rao Jujjuri <
> > jujj...@gmail.com>
> > wrote:
> >
> > > On Thu, Mar 8, 2018 at 2:38 AM, Ivan Kelly <iv...@apache.org> wrote:
> > >
> > > > > Given that RackAwareEnsemble policy defaults to finding a
> replacement
> > > > > bookie within
> > > > > the same rack, when a bookie is lost in a rack, the entire cluster
> > will
> > > > be
> > > > > replicating
> > > > > to the same 'rack'. This puts a lot of pressure on the rack and
> also
> > > > takes
> > > > > a longer time
> > > > > to bring up the replication levels.
> > > >
> > > > I agree this has potential to be problematic.
> > > >
> > > > Perhaps we should provide a switch to RackAwareEnsemble,
> > > > 'preferReplaceInSameRack'.
> > > >
> > > > > I would think the right fix is to bring back the targetBookie
> concept
> > > > (with
> > > > > a configuration parameter) and add placement check predicate on top
> > of
> > > > it.
> > > > > When this is configured
> > > > > each bookie picks up the work,  checks if the ensemble placement
> > policy
> > > > > gets satisfied,
> > > > > if so replicate it, if not move on.
> > > >
> > > > I don't think adding a predicate argument (I guess a
> > > > BiPredicate<Set<BookieSocketAddress>, BookieSocketAddress>?) to the
> > > > recover bookie call makes sense. There is already a way to customize
> > > > this behaviour, by passing in a EnsemblePlacementPolicy on
> > > > Configuration of the client. The behaviour you want can be achieved
> by
> > > > taking one of the current EnsemblePlacementPolicies and overriding
> > > > replaceBookie, though I guess that's not very user-friendly. However,
> > > > even if it was user-friendly, how would we make it easy for users to
> > > > supply a placementpolicy or a even a predicate, as you suggested, to
> > > > the autorecovery daemon.
> > > >
> > >
> > > In the old model if the bookie is writable
> > > AND is not part of ensemble, replicate to the local(target) bookie.
> > > My proposal is t add anotehr AND condition.
> > >
> > > if bookie is writable AND not part of ensemble AND satisfies Enseble
> > > Placement Policy
> > > write to local(target) bookie.
> > >
> >
> >
> > I think this is a good change to take. but we need to differentiate
> things:
> >
> > 1) if bookie recovery is running a separate daemon, we don't need to do
> any
> > changes here.
> >
>

> Why do you say so?
>

Let me step back. The issue you raised here comprised of two parts

1) for a replication worker, which ledgers that it is responsible for
replicating?

2) when replicating a ledger, how to select the bookies for replication?


2) is a general question, no matter how the worker is running, within a
bookie or in a separate job. I think you already covered this part, which I
will comment at your questions.

for 1), if replication worker is running outside of bookies, there is
always network, so it doesn't really matter what ledgers are assigned to
the replication workers. that is what I said "we don't need to do any
changes".



>
> replaceBookie() -> selectFromNetworkLocation always tries to replace
>
> the bookie from "bookieToReplace" rack. Goes to random/Anywhere from
>
> the cluster on BKNotEnoughBookiesException.
>
> This has two issues.
>
> 1. One rack gets the entire responsibility (write) to replace lost bookie.
>

this is a tradeoff here. it depends on how is the ratio between your
bookies and racks.


>
> 2. If no space on this rack, we are going random on the cluster not
> honoring the placmeent (This is a generic problem)
>
>
yes. that is something we should address. however it is an issue in
placement policy and it is not related the changes here.


>
>
> > 2) if bookie recovery is running along with the bookie, we construct a
> > ensemble placement policy which wrap over the rack-aware/region-aware one
> > and override the predicate to take local bookie into account.
> >
> >
> > However there is a problem which this "predicate" model, because it can
> > potentially churn the metadata storage, since now some bookies are
> actually
> > doing nothing but polling the ur replication ledger list. so a long term
> > direction is to change how auditor distributes replication tasks to
> > replication workers.
> >
>
> Generally it is a watch on the underreplicated znode right?
>

The original behavior is once you get a ledger, you acquire it for
replication. So a replication worker always can get a ledger to work on,
there is no worker is idle.

However the behavior is changed, the replication worker will ignore a
ledger if the predicate says no. If in the cluster, there is no bookie
qualified on replicating this ledger,
other bookies will end up polling, because no one occupy it. This pattern
change is not a very big change, but attentions need to be paid.

The "predicate" approach is problematic, it can potentially cause some
ledgers never being replicated. Ideally, this is something should be done
by auditor, because auditor
knows the ledgers, the alive bookies and the network topology, auditor
should be able to compute a replication plan and assign corresponding
ledgers to bookies. This will ensure:

- optimize the placement
- ensure no ledgers will be missed


- Sijie



>
> JV
>
> >
> >
> >
> > >
> > > Thanks,
> > > JV
> > >
> > >
> > > > -Ivan
> > > >
> > >
> > >
> > >
> > > --
> > > Jvrao
> > > ---
> > > First they ignore you, then they laugh at you, then they fight you,
> then
> > > you win. - Mahatma Gandhi
> > >
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>

Reply via email to