Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

Jürgen Albersdorfer Tue, 20 Feb 2018 13:10:17 -0800

We do archiving data in Order to make assumptions on it in future. So, yes we 
expect to grow continously. In the mean time I learned to go for predictable 
grow per partition rather than unpredictable large partitioning. So today we 
are growing 250.000.000 Records per Day going into a single table and heading 
towards to about 100 times that number this year. A Partition will grow one 
Record a Day, which should give us good horizontal scaleability, but means 
250.000.000 to 25.000.000.000 partitions. Hope this Numbers should not make me 
feel uncomfortable :)


Von meinem iPhone gesendet

> Am 20.02.2018 um 21:39 schrieb Jeff Jirsa <jji...@gmail.com>:
> 
> At a past job, we set the limit at around 60 hosts per cluster - anything 
> bigger than that got single token. Anything smaller, and we'd just tolerate 
> the inconveniences of vnodes. But that was before the new vnode token 
> allocation went into 3.0, and really assumed things that may not be true for 
> you (it was a cluster that started at 60 hosts and grew up to 480 in steps, 
> so we'd want to grow quickly - having single token allowed us to grow from 
> 60-120 in 2 days, and then 120-180 in 2 days, and so on).
> 
> Are you always going to be growing, or is it a short/temporary thing?
> There are users of vnodes (at big, public companies) that go up into the 
> hundreds of nodes.
> 
> Most people running cassandra start sharding clusters rather than going past 
> a thousand or so nodes - I know there's at least one person I talked to in 
> IRC with a 1700 host cluster, but that'd be beyond what I'd ever do 
> personally.
> 
> 
> 
>> On Tue, Feb 20, 2018 at 12:34 PM, Jürgen Albersdorfer 
>> <jalbersdor...@gmail.com> wrote:
>> Thanks Jeff,
>> your answer is really not what I expected to learn - which is again more 
>> manual doing as soon as we start really using C*. But I‘m happy to be able 
>> to learn it now and have still time to learn the neccessary Skills and ask 
>> the right questions on how to correctly drive big data with C* until we 
>> actually start using it, and I‘m glad to have People like you around caring 
>> about this questions. Thanks. This still convinces me having bet on the 
>> right horse, even when it might become a rough ride.
>> 
>> By the way, is it possible to migrate towards to smaller token ranges? What 
>> is the recommended way doing so? And which number of nodes is the typical 
>> ‚break even‘?
>> 
>> Von meinem iPhone gesendet
>> 
>>> Am 20.02.2018 um 21:05 schrieb Jeff Jirsa <jji...@gmail.com>:
>>> 
>>> The scenario you describe is the typical point where people move away from 
>>> vnodes and towards single-token-per-node (or a much smaller number of 
>>> vnodes).
>>> 
>>> The default setting puts you in a situation where virtually all hosts are 
>>> adjacent/neighbors to all others (at least until you're way into the 
>>> hundreds of hosts), which means you'll stream from nearly all hosts. If you 
>>> drop the number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the 
>>> number of streams drop as well.
>>> 
>>> Many people with "large" clusters statically allocate tokens to make it 
>>> predictable - if you have a single token per host, you can add multiple 
>>> hosts at a time, each streaming from a small number of neighbors, without 
>>> overlap.
>>> 
>>> It takes a bit more tooling (or manual token calculation) outside of 
>>> cassandra, but works well in practice for "large" clusters.
>>> 
>>> 
>>> 
>>> 
>>>> On Tue, Feb 20, 2018 at 4:42 AM, Jürgen Albersdorfer 
>>>> <jalbersdor...@gmail.com> wrote:
>>>> Hi, I'm wondering if it is possible resp. would it make sense to limit 
>>>> concurrent streaming when joining a new node to cluster.
>>>> 
>>>> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining 
>>>> another Node every day.
>>>> The 'nodetool netstats' shows it always streams data from all other nodes.
>>>> 
>>>> How far will this scale? - What happens when I have hundrets or even 
>>>> thousends of Nodes?
>>>> 
>>>> Has anyone experience with such a Situation?
>>>> 
>>>> Thanks, and regards
>>>> Jürgen
>>> 
>

Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?

Reply via email to