Il mer 24 mag 2017, 21:49 Sijie Guo <guosi...@gmail.com> ha scritto: > On Tue, May 16, 2017 at 11:39 PM, Venkateswara Rao Jujjuri < > jujj...@gmail.com> wrote: > > > We have this use case too. I believe introducing "pools" Is the right > > approach for this. > > Pools are very high level abstraction and it is treated as simply two > > different clusters, but wrapped into one. > > > > Some of the high level thoughts: > > > > * Pools are top level abstraction. > > * Pool is assigned at the time of ledger creation.(based one some > criteria > > at client) > > * Ensemble changes, replication happens only in that pool of bookies. > > * Stats, Storage capacity is tracked at pool level. > > * Capadd is to a particular pool. > > * Each pool of bookies may run with different server configurations > > * Client configuration should accommodate pools too, different > > configuration values under different pools. > > > > JV, can you come up with more details about the pool thing? Have you > considered using pool for 128 bits ledger id? >
News? Enrico > - Sijie > > > > > > > > JV > > > > On Tue, May 16, 2017 at 7:49 AM, Bobby Evans <ev...@yahoo-inc.com.invalid > > > > wrote: > > > > > OK so I am not keen on the idea of labels. Probably because when I > have > > > seen it done in the past (YARN) it just felt like a hack that was > trying > > to > > > avoid fixing the real underlying problem. YARN wanted to schedule for > > > arbitrary resources but that is hard so they went with Node Labels > > > instead. Node labels have evolved in YARN and are now used for > > > partitioning a cluster for isolation as well (although it really is > > because > > > network scheduling/isolation is hard). > > > > > > Now that I am done with my YARN node label rant I want to add that > HBase > > > put in an option for isolating table groups from each other on > different > > > region servers that has worked really well for a multi-tenant setup, > so I > > > am not completely opposed to the idea I just want to be sure we do it > > right. > > > > > > In my opinion if this is a feature to isolate different groups from > each > > > other to avoid one bad actor impacting everyone else I would prefer to > > see > > > something with quotas for clients and/or users and nodes reporting > their > > > capabilities + current usage instead. If you want some kind of > affinity > > > because you bought hardware to handle longer term vs shorter term > storage > > > then I would prefer to see that called out explicitly when the ledger > is > > > created instead of having arbitrary labels. That way a long lived > ledger > > > could be placed on a node with lots of free capacity and short lived > > > ledgers can go anywhere. A client could either set it when they > create a > > > ledger and have a default in the config if it is not specified. > > > > > > If we do go with labels I want to be sure that we stress that users > > should > > > keep their matching rules as simple as possible. > > > Hard partitioning of a cluster on labels provides a lot of possibility > to > > > shoot yourself in the foot and not notice it. > > > They need to make sure that they have ways to easily monitor bookies > > > grouped in the same way their client rules do. They need to make sure > > that > > > when doing a rolling upgrade that they take the client rules into > account > > > when deciding what to take out and upgrade to avoid making a group of > > > clients completely unusable. > > > > > > - Bobby > > > > > > On Tuesday, May 16, 2017, 6:05:21 AM CDT, Enrico Olivelli < > > > eolive...@gmail.com> wrote:Hi bookkeepers, > > > I'm using BookKeeper for serveral projects, every project has its own > > > workload characteristics and I would like to be able to assign bookies > > > depending of the client type. It is quite common to share a BookKeeper > > > cluster between different applications. > > > > > > For instance I am using Bookies to store Database logs, Task Brokers > > > logs and recently I have started to use BookKeeper as data storage. > > > > > > Within the cluster I would like to use specific Bookies for mid-term > > > storage, some bookies for logs...and so on, but current placement > > > policies are not able to "distinguish" bookies. > > > > > > Actually I can achieve my goal by using a custom policy + custom > > > metadata + out of band bookie metadata. > > > > > > I would like to introduce a first step, following the work of on > > > "Resource aware data placement" (1), and introduce a list of "labels" > > > to be assigned to every bookie. > > > > > > For instance: bookies for long term storage will have label > > > "long-term", bookies for transaction logs may have label "wals". > > > > > > Another use case is to be able to request BookKeeper to write ledger > > > data on specific sets of bookies depending on the "customer" who is > > > the owner of data (I have customers already grouped by labels/tags) > > > > > > I would like to have a simple "standard" policy which uses some > > > "standard" metadata to select bookies. > > > > > > Thinks to add: > > > - a set of "labels" configurable for bookies > > > - Enrich the API (getBookieInfo) to query for labels and BookKeeper > > > client to keep a local cache of label-to-bookie assignments > > > - add a standard "custom metadata field" which is a list of labels to > > > use to select bookies, a bookie would be used only of it currently > > > "has" all of the labels requested > > > > > > > > > [1] https://cwiki.apache.org/confluence/display/BOOKKEEPER/ > > > BP-2+-+Resource+aware+data+placement > > > > > > All comments are welcome > > > > > > -- Enrico > > > > > > > > > > > -- > > Jvrao > > --- > > First they ignore you, then they laugh at you, then they fight you, then > > you win. - Mahatma Gandhi > > > -- -- Enrico Olivelli