We do archiving data in Order to make assumptions on it in future. So, yes we expect to grow continously. In the mean time I learned to go for predictable grow per partition rather than unpredictable large partitioning. So today we are growing 250.000.000 Records per Day going into a single table and heading towards to about 100 times that number this year. A Partition will grow one Record a Day, which should give us good horizontal scaleability, but means 250.000.000 to 25.000.000.000 partitions. Hope this Numbers should not make me feel uncomfortable :)
Von meinem iPhone gesendet > Am 20.02.2018 um 21:39 schrieb Jeff Jirsa <jji...@gmail.com>: > > At a past job, we set the limit at around 60 hosts per cluster - anything > bigger than that got single token. Anything smaller, and we'd just tolerate > the inconveniences of vnodes. But that was before the new vnode token > allocation went into 3.0, and really assumed things that may not be true for > you (it was a cluster that started at 60 hosts and grew up to 480 in steps, > so we'd want to grow quickly - having single token allowed us to grow from > 60-120 in 2 days, and then 120-180 in 2 days, and so on). > > Are you always going to be growing, or is it a short/temporary thing? > There are users of vnodes (at big, public companies) that go up into the > hundreds of nodes. > > Most people running cassandra start sharding clusters rather than going past > a thousand or so nodes - I know there's at least one person I talked to in > IRC with a 1700 host cluster, but that'd be beyond what I'd ever do > personally. > > > >> On Tue, Feb 20, 2018 at 12:34 PM, Jürgen Albersdorfer >> <jalbersdor...@gmail.com> wrote: >> Thanks Jeff, >> your answer is really not what I expected to learn - which is again more >> manual doing as soon as we start really using C*. But I‘m happy to be able >> to learn it now and have still time to learn the neccessary Skills and ask >> the right questions on how to correctly drive big data with C* until we >> actually start using it, and I‘m glad to have People like you around caring >> about this questions. Thanks. This still convinces me having bet on the >> right horse, even when it might become a rough ride. >> >> By the way, is it possible to migrate towards to smaller token ranges? What >> is the recommended way doing so? And which number of nodes is the typical >> ‚break even‘? >> >> Von meinem iPhone gesendet >> >>> Am 20.02.2018 um 21:05 schrieb Jeff Jirsa <jji...@gmail.com>: >>> >>> The scenario you describe is the typical point where people move away from >>> vnodes and towards single-token-per-node (or a much smaller number of >>> vnodes). >>> >>> The default setting puts you in a situation where virtually all hosts are >>> adjacent/neighbors to all others (at least until you're way into the >>> hundreds of hosts), which means you'll stream from nearly all hosts. If you >>> drop the number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the >>> number of streams drop as well. >>> >>> Many people with "large" clusters statically allocate tokens to make it >>> predictable - if you have a single token per host, you can add multiple >>> hosts at a time, each streaming from a small number of neighbors, without >>> overlap. >>> >>> It takes a bit more tooling (or manual token calculation) outside of >>> cassandra, but works well in practice for "large" clusters. >>> >>> >>> >>> >>>> On Tue, Feb 20, 2018 at 4:42 AM, Jürgen Albersdorfer >>>> <jalbersdor...@gmail.com> wrote: >>>> Hi, I'm wondering if it is possible resp. would it make sense to limit >>>> concurrent streaming when joining a new node to cluster. >>>> >>>> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining >>>> another Node every day. >>>> The 'nodetool netstats' shows it always streams data from all other nodes. >>>> >>>> How far will this scale? - What happens when I have hundrets or even >>>> thousends of Nodes? >>>> >>>> Has anyone experience with such a Situation? >>>> >>>> Thanks, and regards >>>> Jürgen >>> >