This sounds worthy of a bug report! We should at least document any such inadequacy, and come up with a plan to fix it. It would be great if you could file a ticket with a detailed example of the problem.
> On 24 Sep 2018, at 14:57, Tom van der Woerdt <tom.vanderwoe...@booking.com> > wrote: > > Late comment, but I'll write it anyway. > > The main advantage of random allocation over the new allocation strategy is > that it seems to be significantly better when dealing with node *removals*, > when the order of removal is not the inverse of the order of addition. This > can lead to severely unbalanced clusters when the new strategy is enabled. > > I tend to go with the random allocation for this reason: you can freely > add/remove nodes when needed, and the data distribution will remain "good > enough". It's only when the data density becomes high enough that the new > token allocation strategy really matters, imho. > > Hope that helps! > > Tom van der Woerdt > Site Reliability Engineer > > Booking.com B.V. > Vijzelstraat 66-80 Amsterdam 1017HL Netherlands > [image: Booking.com] <https://www.booking.com/> > The world's #1 accommodation site > 43 languages, 198+ offices worldwide, 120,000+ global destinations, > 1,550,000+ room nights booked every day > No booking fees, best price always guaranteed > Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) > > > On Sat, Sep 22, 2018 at 8:12 PM Jonathan Haddad <j...@jonhaddad.com> wrote: > >> Is there a use case for random allocation? How does it help with testing? I >> can’t see a reason to keep it around. >> >> On Sat, Sep 22, 2018 at 3:06 AM kurt greaves <k...@instaclustr.com> wrote: >> >>> +1. I've been making a case for this for some time now, and was actually >> a >>> focus of my talk last week. I'd be very happy to get this into 4.0. >>> >>> We've tested various num_tokens with the algorithm on various sized >>> clusters and we've found that typically 16 works best. With lower numbers >>> we found that balance is good initially but as a cluster gets larger you >>> have some problems. E.g We saw that on a 60 node cluster with 8 tokens >> per >>> node we were seeing a difference of 22% in token ownership, but on a <=12 >>> node cluster a difference of only 12%. 16 tokens on the other hand wasn't >>> perfect but generally gave a better balance regardless of cluster size at >>> least up to 100 nodes. TBH we should probably do some proper testing and >>> record all the results for this before we pick a default (I'm happy to do >>> this - think we can use the original testing script for this). >>> >>> But anyway, I'd say Jon is on the right track. Personally how I'd like to >>> see it is that we: >>> >>> 1. Change allocate_tokens_for_keyspace to allocate_tokens_for_rf in >> the >>> same way that DSE does it. Allowing a user to specify a RF to allocate >>> from, and allowing multiple DC's. >>> 2. Add a new boolean property random_token_allocation, defaults to >>> false. >>> 3. Make allocate_tokens_for_rf default to *unset**. >>> 4. Make allocate_tokens_for_rf *required*** if num_tokens > 1 and >>> random_token_allocation != true. >>> 5. Default num_tokens to 16 (or whatever we find appropriate) >>> >>> * I think setting a default is asking for trouble. When people are going >> to >>> add new DC's/nodes we don't want to risk them adding a node with the >> wrong >>> RF. I think it's safe to say that a user should have to think about this >>> before they spin up their cluster. >>> ** Following above, it should be required to be set so that we don't have >>> people accidentally using random allocation. I think we should really be >>> aiming to get rid of random allocation completely, but provide a new >>> property to enable it for backwards compatibility (also for testing). >>> >>> It's worth noting that a smaller number of tokens *theoretically* >> decreases >>> the time for replacement/rebuild, so if we're considering QUORUM >>> availability with vnodes there's an argument against having a very low >>> num_tokens. I think it's better to utilise NTS and racks to reduce the >>> chance of a QUORUM outage over banking on having a lower number of >> tokens, >>> as with just a low number of tokens unless you go all the way to 1 you >> are >>> just relying on luck that 2 nodes don't overlap. Guess what I'm saying is >>> that I think we should be choosing a num_tokens that gives the best >>> distribution for most cluster sizes rather than choosing one that >>> "decreases" the probability of an outage. >>> >>> Also I think we should continue using CASSANDRA-13701 to track this. TBH >> I >>> think in general we should be a bit better at searching for and using >>> existing tickets... >>> >>> On Sat, 22 Sep 2018 at 18:13, Stefan Podkowinski <s...@apache.org> >> wrote: >>> >>>> There already have been some discussions on this here: >>>> https://issues.apache.org/jira/browse/CASSANDRA-13701 >>>> >>>> The mentioned blocker there on the token allocation shouldn't exist >>>> anymore. Although it would be good to get more feedback on it, in case >>>> we want to enable it by default, along with new defaults for number of >>>> tokens. >>>> >>>> >>>> On 22.09.18 06:30, Dinesh Joshi wrote: >>>>> Jon, thanks for starting this thread! >>>>> >>>>> I have created CASSANDRA-14784 to track this. >>>>> >>>>> Dinesh >>>>> >>>>>> On Sep 21, 2018, at 9:18 PM, Sankalp Kohli <kohlisank...@gmail.com> >>>> wrote: >>>>>> >>>>>> Putting it on JIRA is to make sure someone is assigned to it and it >> is >>>> tracked. Changes should be discussed over ML like you are saying. >>>>>> >>>>>> On Sep 21, 2018, at 21:02, Jonathan Haddad <j...@jonhaddad.com> >> wrote: >>>>>> >>>>>>>> We should create a JIRA to find what other defaults we need >> revisit. >>>>>>> Changing a default is a pretty big deal, I think we should discuss >>> any >>>>>>> changes to defaults here on the ML before moving it into JIRA. >> It's >>>> nice >>>>>>> to get a bit more discussion around the change than what happens in >>>> JIRA. >>>>>>> >>>>>>> We (TLP) did some testing on 4 tokens and found it to work >>> surprisingly >>>>>>> well. It wasn't particularly formal, but we verified the load >> stays >>>>>>> pretty even with only 4 tokens as we added nodes to the cluster. >>>> Higher >>>>>>> token count hurts availability by increasing the number of nodes >> any >>>> given >>>>>>> node is a neighbor with, meaning any 2 nodes that fail have an >>>> increased >>>>>>> chance of downtime when using QUORUM. In addition, with the recent >>>>>>> streaming optimization it seems the token counts will give a >> greater >>>> chance >>>>>>> of a node streaming entire sstables (with LCS), meaning we'll do a >>>> better >>>>>>> job with node density out of the box. >>>>>>> >>>>>>> Next week I can try to put together something a little more >>> convincing. >>>>>>> Weekend time. >>>>>>> >>>>>>> Jon >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 21, 2018 at 8:45 PM sankalp kohli < >>> kohlisank...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> +1 to lowering it. >>>>>>>> Thanks Jon for starting this.We should create a JIRA to find what >>>> other >>>>>>>> defaults we need revisit. (Please keep this discussion for >> "default >>>> token" >>>>>>>> only. ) >>>>>>>> >>>>>>>>> On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa <jji...@gmail.com> >>> wrote: >>>>>>>>> >>>>>>>>> Also agree it should be lowered, but definitely not to 1, and >>>> probably >>>>>>>>> something closer to 32 than 4. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jeff Jirsa >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Sep 21, 2018, at 8:24 PM, Jeremy Hanna < >>>> jeremy.hanna1...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> I agree that it should be lowered. What I’ve seen debated a bit >> in >>>> the >>>>>>>>> past is the number but I don’t think anyone thinks that it should >>>> remain >>>>>>>>> 256. >>>>>>>>>>> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad < >> j...@jonhaddad.com> >>>>>>>> wrote: >>>>>>>>>>> One thing that's really, really bothered me for a while is how >> we >>>>>>>>> default >>>>>>>>>>> to 256 tokens still. There's no experienced operator that >> leaves >>>> it >>>>>>>> as >>>>>>>>> is >>>>>>>>>>> at this point, meaning the only people using 256 are the poor >>> folks >>>>>>>> that >>>>>>>>>>> just got started using C*. I've worked with over a hundred >>>> clusters >>>>>>>> in >>>>>>>>> the >>>>>>>>>>> last couple years, and I think I only worked with one that had >>>> lowered >>>>>>>>> it >>>>>>>>>>> to something else. >>>>>>>>>>> >>>>>>>>>>> I think it's time we changed the default to 4 (or 8, up for >>>> debate). >>>>>>>>>>> >>>>>>>>>>> To improve the behavior, we need to change a couple other >> things. >>>> The >>>>>>>>>>> allocate_tokens_for_keyspace setting is... odd. It requires >> you >>>> have >>>>>>>> a >>>>>>>>>>> keyspace already created, which doesn't help on new clusters. >>> What >>>>>>>> I'd >>>>>>>>>>> like to do is add a new setting, allocate_tokens_for_rf, and >> set >>>> it to >>>>>>>>> 3 by >>>>>>>>>>> default. >>>>>>>>>>> >>>>>>>>>>> To handle clusters that are already using 256 tokens, we could >>>> prevent >>>>>>>>> the >>>>>>>>>>> new node from joining unless a -D flag is set to explicitly >> allow >>>>>>>>>>> imbalanced tokens. >>>>>>>>>>> >>>>>>>>>>> We've agreed to a trunk freeze, but I feel like this is >> important >>>>>>>> enough >>>>>>>>>>> (and pretty trivial) to do now. I'd also personally >> characterize >>>> this >>>>>>>>> as a >>>>>>>>>>> bug fix since 256 is horribly broken when the cluster gets to >> any >>>>>>>>>>> reasonable size, but maybe I'm alone there. >>>>>>>>>>> >>>>>>>>>>> I honestly can't think of a use case where random tokens is a >>> good >>>>>>>>> choice >>>>>>>>>>> anymore, so I'd be fine / ecstatic with removing it completely >>> and >>>>>>>>>>> requiring either allocate_tokens_for_keyspace (for existing >>>> clusters) >>>>>>>>>>> or allocate_tokens_for_rf >>>>>>>>>>> to be set. >>>>>>>>>>> >>>>>>>>>>> Thoughts? Objections? >>>>>>>>>>> -- >>>>>>>>>>> Jon Haddad >>>>>>>>>>> http://www.rustyrazorblade.com >>>>>>>>>>> twitter: rustyrazorblade >>>>>>>>>> >>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>> >>>>>>>>> >>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jon Haddad >>>>>>> http://www.rustyrazorblade.com >>>>>>> twitter: rustyrazorblade >>>>>> >> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> >>>> >>> >> -- >> Jon Haddad >> http://www.rustyrazorblade.com >> twitter: rustyrazorblade >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org