oh.. I'm sorry, that's my fault. Thank you for correction, Jon. While it's possible in theory, didn't figure out that 256-token nodes will get relatively more and more data with each next decom of 256-token node.
Regards, Kyrill ________________________________________ From: Jon Haddad <jonathan.had...@gmail.com> on behalf of Jon Haddad <j...@jonhaddad.com> Sent: Saturday, February 24, 2018 5:44:24 PM To: user@cassandra.apache.org Subject: Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes? You can’t migrate down that way. The last several nodes you have up will get completely overwhelmed, and you’ll be completely screwed. Please do not give advice like this unless you’ve actually gone through the process or at least have an understanding of how the data will be shifted. Adding nodes with 16 tokens while decommissioning the ones with 256 will be absolute hell. You can only do this by adding a new DC and retiring the old. On Feb 24, 2018, at 2:26 AM, Kyrylo Lebediev <kyrylo_lebed...@epam.com<mailto:kyrylo_lebed...@epam.com>> wrote: > By the way, is it possible to migrate towards to smaller token ranges? What > is the recommended way doing so? - Didn't see this question answered. I think, be easiest way to do this is to add new C* nodes with lower vnodes (8, 16 instead of default 256) then decom old nodes with vnodes=256. Thanks, guys, for shedding some light on this Java multithread-related scalability issue. BTW how to understand from JVM / OS metrics that number of threads for a JVM becomes a bottleneck? Also, I'd like to add a comment: the higher number of vnodes per a node the lower overall reliability of the cluster. Replicas for a token range are placed on the nodes responsible for next+1, next+2 ranges (not taking into account NetworkTopologyStrategy / Snitch which help but seemingly not so much expressing in terms of probabilities). The higher number of vnodes per a node, the higher probability all nodes in the cluster will become 'neighbors' in terms of token ranges. It's not a trivial formula for 'reliability' of C* cluster [haven't got a chance to do math....], but in general, having a bigger number of nodes in a cluster (like 100 or 200), probability of 2 or more nodes are down at the same time increases proportionally the the number of nodes. The most reliable C* setup is using initial_token instead of vnodes. But this makes manageability of C* cluster worse [not so automated + there will hotshots in the cluster in some cases]. Remark: for C* cluster with RF=3 any number of nodes and initial_token/vnodes setup there is always a possibility that simultaneous unavailability of 2(or 3, depending on which CL is used) nodes will lead to unavailability of a token range ('HostUnavailable' exception). No miracles: reliability is mostly determined by RF number. Which task must be solved for large clusters: "Reliability of a cluster with NNN nodes and RF=3 shouldn't be 'tangibly' less than reliability of 3-nodes cluster with RF=3" Kind Regards, Kyrill ________________________________ From: Jürgen Albersdorfer <jalbersdor...@gmail.com<mailto:jalbersdor...@gmail.com>> Sent: Tuesday, February 20, 2018 10:34:21 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes? Thanks Jeff, your answer is really not what I expected to learn - which is again more manual doing as soon as we start really using C*. But I‘m happy to be able to learn it now and have still time to learn the neccessary Skills and ask the right questions on how to correctly drive big data with C* until we actually start using it, and I‘m glad to have People like you around caring about this questions. Thanks. This still convinces me having bet on the right horse, even when it might become a rough ride. By the way, is it possible to migrate towards to smaller token ranges? What is the recommended way doing so? And which number of nodes is the typical ‚break even‘? Von meinem iPhone gesendet Am 20.02.2018 um 21:05 schrieb Jeff Jirsa <jji...@gmail.com<mailto:jji...@gmail.com>>: The scenario you describe is the typical point where people move away from vnodes and towards single-token-per-node (or a much smaller number of vnodes). The default setting puts you in a situation where virtually all hosts are adjacent/neighbors to all others (at least until you're way into the hundreds of hosts), which means you'll stream from nearly all hosts. If you drop the number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the number of streams drop as well. Many people with "large" clusters statically allocate tokens to make it predictable - if you have a single token per host, you can add multiple hosts at a time, each streaming from a small number of neighbors, without overlap. It takes a bit more tooling (or manual token calculation) outside of cassandra, but works well in practice for "large" clusters. On Tue, Feb 20, 2018 at 4:42 AM, Jürgen Albersdorfer <jalbersdor...@gmail.com<mailto:jalbersdor...@gmail.com>> wrote: Hi, I'm wondering if it is possible resp. would it make sense to limit concurrent streaming when joining a new node to cluster. I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining another Node every day. The 'nodetool netstats' shows it always streams data from all other nodes. How far will this scale? - What happens when I have hundrets or even thousends of Nodes? Has anyone experience with such a Situation? Thanks, and regards Jürgen --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org