Re: Bookie labels and Placement policy

Sijie Guo Wed, 24 May 2017 12:49:43 -0700

On Tue, May 16, 2017 at 11:39 PM, Venkateswara Rao Jujjuri <
[email protected]> wrote:


> We have this use case too. I believe introducing "pools" Is the right
> approach for this.
> Pools are very high level abstraction and it is treated as simply two
> different clusters, but wrapped into one.
>
> Some of the high level thoughts:
>
> * Pools are top level abstraction.
> * Pool is assigned at the time of ledger creation.(based one some criteria
> at client)
> * Ensemble changes, replication happens only in that pool of bookies.
> * Stats, Storage capacity is tracked at pool level.
> * Capadd is to a particular pool.
> * Each pool of bookies may run with different server configurations
> * Client configuration should accommodate pools too, different
> configuration values under different pools.
>

JV, can you come up with more details about the pool thing? Have you
considered using pool for 128 bits ledger id?

- Sijie




>
> JV
>
> On Tue, May 16, 2017 at 7:49 AM, Bobby Evans <[email protected]>
> wrote:
>
> > OK so I am not keen on the idea of labels.  Probably because when I have
> > seen it done in the past (YARN) it just felt like a hack that was trying
> to
> > avoid fixing the real underlying problem. YARN wanted to schedule for
> > arbitrary resources but that is hard so they went with Node Labels
> > instead.  Node labels have evolved in YARN and are now used for
> > partitioning a cluster for isolation as well (although it really is
> because
> > network scheduling/isolation is hard).
> >
> > Now that I am done with my YARN node label rant I want to add that HBase
> > put in an option for isolating table groups from each other on different
> > region servers that has worked really well for a multi-tenant setup, so I
> > am not completely opposed to the idea I just want to be sure we do it
> right.
> >
> > In my opinion if this is a feature to isolate different groups from each
> > other to avoid one bad actor impacting everyone else I would prefer to
> see
> > something with quotas for clients and/or users and nodes reporting their
> > capabilities + current usage instead.  If you want some kind of affinity
> > because you bought hardware to handle longer term vs shorter term storage
> > then I would prefer to see that called out explicitly when the ledger is
> > created instead of having arbitrary labels.  That way a long lived ledger
> > could be placed on a node with lots of free capacity and short lived
> > ledgers can go anywhere.  A client could either set it when they create a
> > ledger and have a default in the config if it is not specified.
> >
> > If we do go with labels I want to be sure that we stress that users
> should
> > keep their matching rules as simple as possible.
> > Hard partitioning of a cluster on labels provides a lot of possibility to
> > shoot yourself in the foot and not notice it.
> > They need to make sure that they have ways to easily monitor bookies
> > grouped in the same way their client rules do.  They need to make sure
> that
> > when doing a rolling upgrade that they take the client rules into account
> > when deciding what to take out and upgrade to avoid making a group of
> > clients completely unusable.
> >
> > - Bobby
> >
> > On Tuesday, May 16, 2017, 6:05:21 AM CDT, Enrico Olivelli <
> > [email protected]> wrote:Hi bookkeepers,
> > I'm using BookKeeper for serveral projects, every project has its own
> > workload characteristics and I would like to be able to assign bookies
> > depending of the client type. It is quite common to share a BookKeeper
> > cluster between different applications.
> >
> > For instance I am using Bookies to store Database logs, Task Brokers
> > logs and recently I have started to use BookKeeper as data storage.
> >
> > Within the cluster I would like to use specific Bookies for mid-term
> > storage, some bookies for logs...and so on, but current placement
> > policies are not able to "distinguish" bookies.
> >
> > Actually I can achieve my goal by using a custom policy + custom
> > metadata + out of band bookie metadata.
> >
> > I would like to introduce a first step, following the work of on
> > "Resource aware data placement" (1), and introduce a list of "labels"
> > to be assigned to every bookie.
> >
> > For instance: bookies for long term storage will have label
> > "long-term", bookies for transaction logs may have label "wals".
> >
> > Another use case is to be able to request BookKeeper to write ledger
> > data on specific sets of bookies depending on the "customer" who is
> > the owner of data (I have customers already grouped by labels/tags)
> >
> > I would like to have a simple "standard" policy which uses some
> > "standard" metadata to select bookies.
> >
> > Thinks to add:
> > - a set  of "labels" configurable for bookies
> > - Enrich the API (getBookieInfo) to query for labels and BookKeeper
> > client to keep a local cache of label-to-bookie assignments
> > - add a standard "custom metadata field"  which is a list of labels to
> > use to select bookies, a bookie would be used only of it currently
> > "has" all of the labels requested
> >
> >
> > [1] https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > BP-2+-+Resource+aware+data+placement
> >
> > All comments are welcome
> >
> > -- Enrico
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>

Re: Bookie labels and Placement policy

Reply via email to