On Thu, Mar 8, 2018 at 8:07 AM, Venkateswara Rao Jujjuri <jujj...@gmail.com> wrote:
> On Thu, Mar 8, 2018 at 2:38 AM, Ivan Kelly <iv...@apache.org> wrote: > > > > Given that RackAwareEnsemble policy defaults to finding a replacement > > > bookie within > > > the same rack, when a bookie is lost in a rack, the entire cluster will > > be > > > replicating > > > to the same 'rack'. This puts a lot of pressure on the rack and also > > takes > > > a longer time > > > to bring up the replication levels. > > > > I agree this has potential to be problematic. > > > > Perhaps we should provide a switch to RackAwareEnsemble, > > 'preferReplaceInSameRack'. > > > > > I would think the right fix is to bring back the targetBookie concept > > (with > > > a configuration parameter) and add placement check predicate on top of > > it. > > > When this is configured > > > each bookie picks up the work, checks if the ensemble placement policy > > > gets satisfied, > > > if so replicate it, if not move on. > > > > I don't think adding a predicate argument (I guess a > > BiPredicate<Set<BookieSocketAddress>, BookieSocketAddress>?) to the > > recover bookie call makes sense. There is already a way to customize > > this behaviour, by passing in a EnsemblePlacementPolicy on > > Configuration of the client. The behaviour you want can be achieved by > > taking one of the current EnsemblePlacementPolicies and overriding > > replaceBookie, though I guess that's not very user-friendly. However, > > even if it was user-friendly, how would we make it easy for users to > > supply a placementpolicy or a even a predicate, as you suggested, to > > the autorecovery daemon. > > > > In the old model if the bookie is writable > AND is not part of ensemble, replicate to the local(target) bookie. > My proposal is t add anotehr AND condition. > > if bookie is writable AND not part of ensemble AND satisfies Enseble > Placement Policy > write to local(target) bookie. > I think this is a good change to take. but we need to differentiate things: 1) if bookie recovery is running a separate daemon, we don't need to do any changes here. 2) if bookie recovery is running along with the bookie, we construct a ensemble placement policy which wrap over the rack-aware/region-aware one and override the predicate to take local bookie into account. However there is a problem which this "predicate" model, because it can potentially churn the metadata storage, since now some bookies are actually doing nothing but polling the ur replication ledger list. so a long term direction is to change how auditor distributes replication tasks to replication workers. > > Thanks, > JV > > > > -Ivan > > > > > > -- > Jvrao > --- > First they ignore you, then they laugh at you, then they fight you, then > you win. - Mahatma Gandhi >