Re: Node Size

Joe Obernberger Wed, 20 Jan 2021 13:28:30 -0800

This is great information - thank you!

I'm coming from HDFS+Hbase, lots of nodes, nodes with many spindles. When a drive fails in this environment (which happens a lot with 16-24drives per node), HDFS removes that one failed volume and then maintainsthe 3x replication with the rest of the cluster. As long as that driveis not the boot volume, we can then replace the failed drive live,typically without even doing a reboot of that one node. We'll usuallyhave two drives in a mirror RAID for boot, and then JBOD for the datadrives.


If a data drive fails on a Cassandra server, does the whole node come down?

-Joe

On 1/20/2021 12:13 PM, Durity, Sean R wrote:

This is a great way to think through the problem and solution. I willadd that part of my calculation on failure time is how long does ittake to actually replace a drive and/or a server with (however many)drives? We pay for very fast vendor SLAs. However, in reality, therehas been quite a bit more activity before any of those SLAs kicks inand then the hardware is actually ready for use by Cassandra. So, Icalculate my needed capacity and preferred node sizes with thosefactors included. (This is for on-prem hardware, not acloud-there’s-always-a-spare model.)
Sean Durity

*From:* Jeff Jirsa <jji...@gmail.com>
*Sent:* Wednesday, January 20, 2021 11:59 AM
*To:* cassandra <user@cassandra.apache.org>
*Subject:* [EXTERNAL] Re: Node Size
Not going to give a number other than to say that 1TB/instance isprobably super super super conservative in 2021. The modern number islikely considerably higher. But let's look at this from firstprinciples. There's basically two things to worry about here:
1) Can you get enough CPU/memory to support a query load over thatmuch data, and
2) When that machine fails, what happens?
Let's set aside 1, because you can certainly find some query patternthat works, e.g. write-only with time window compaction or somethingwhere there's very little actual work to maintain state.
So focusing on 2, a few philosophical notes:
2.a) For each range, cassandra streams from one replica. That means ifyou use a single token and RF=3, you're probably streaming from 3hosts at a time
2.b) In cassandra 0.whatever to 3.11, streaming during replacementpresumed that you would only send a portion of each data file to thenew node, so it deserialized and reserialized most of the contents,even if the whole file was being sent (in LCS, sending the whole fileis COMMON; in TWCS / STCS, it's less common)
2.c) Each data file doing the partial file streaming ser/deser usesexactly one core/thread on the receiving side. Adding extra cpu doesntspeed up streaming when you have to serialize/deserialize.
2.d) The more disks you put into a system, the more likely it is thatany disk on a host fails, so your frequency of failure will go up withmore disks.
What's that mean?

The time it takes to rebuild a failed node depends on:
- Whether or not you're using vnodes (recalling that Joey at Netflixdid some fun math that says lots of vnodes makes your chance ofoutage/dataloss go up very very quickly)
- Whether or not you're using LCS (recalling that LCS is super IOintensive compared to other compaction strategies)
- Whether or not you're running RAID on the host
Vnodes means more streaming sources, but also increases your chance ofan outage with concurrent host failures.
LCS means streaming is faster, but also requires a lot more IO to maintain
RAID is ... well, RAID. You're still doing the same type of rebuildoperation there, and losing capacity, so ... dont do that probably.
If you are clever enough to run more than one cassandra instance onthe host, you protect yourself from the "bad" vnode behaviors(likelihood of an outage with 2 hosts down, ability to do simultaneoushosts joining/leaving/moving, etc), but it requires multiple IPs and alot more effort.
So, how much data can you put onto a machine? Calculate your failurerate. Calculate your rebuild time. Figure out your chances of twofailures in that same window, and the cost to your business of anoutage/data loss if that were to happen. Keep adjusting fill sizes /ratios until you get numbers that work for you.
On Wed, Jan 20, 2021 at 7:59 AM Joe Obernberger<joseph.obernber...@gmail.com <mailto:joseph.obernber...@gmail.com>>wrote:
    Thank you Sean and Yakir.  Is 4.x the same?

    So if you were to build a 1PByte system, you would want 512-1024
    nodes?  Doesn't seem space efficient vs say 48TByte nodes where
    you would need ~21 machines.
    What would you do to build a 1PByte configuration?  I know there
    are a lot of - it depends - on that question, but say it was a
    write heavy, light read setup.  Thank you!

    -Joe

    On 1/20/2021 10:06 AM, Durity, Sean R wrote:

        Yakir is correct. While it is feasible to have large disk
        nodes, the practical aspect of managing them is an issue. With
        the current technology, I do not build nodes with more than
        about 3.5 TB of disk available. I prefer 1-2 TB, but
        costs/number of nodes can change the considerations.

        Putting more than 1 node of Cassandra on a given host is also
        possible, but you will want to consider your availability if
        that hardware goes down. Losing 2 or more nodes with one
        failure is usually not good.

        NOTE: DataStax has some new features for supporting much
        larger disks and alleviating many of the admin pains
        associated with it. I don’t have personal experience with it,
        yet, but I will be testing it soon. In my understanding it is
        for use cases with massive needs for disk, but low to moderate
        throughput (ie, where node expansion is only for disk, not
        additional traffic).

        Sean Durity

        *From:* Yakir Gibraltar <yaki...@gmail.com>
        <mailto:yaki...@gmail.com>
        *Sent:* Wednesday, January 20, 2021 9:21 AM
        *To:* user@cassandra.apache.org <mailto:user@cassandra.apache.org>
        *Subject:* [EXTERNAL] Re: Node Size

        It possible to use large nodes and it will work, the problem
        of large nodes will be:

          * Maintenance like join/remove nodes will take more time.
          * Larger heap
          * etc.

        On Wed, Jan 20, 2021 at 3:54 PM Joe Obernberger
        <joseph.obernber...@gmail.com
        <mailto:joseph.obernber...@gmail.com>> wrote:

            Anyone know where I could find out more information on this?
            Thanks!

            -Joe

            On 1/13/2021 8:42 AM, Joe Obernberger wrote:
            > Reading the documentation on Cassandra 3.x there is
            recommendations
            > that node size should be ~1TByte of data.  Modern
            servers can have 24
            > SSDs, each at 2TBytes in size for data. Is that a bad
            idea for
            > Cassandra?  Does 4.0beta4 handle larger nodes?
            > We have machines that have 16, 8TBytes SATA drives -
            would that be a
            > bad server for Cassandra?  Would it make sense to run
            multiple copies
            > of Cassandra on the same node in that case?
            >
            > Thanks!
            >
            > -Joe
            >

            
---------------------------------------------------------------------
            To unsubscribe, e-mail:
            user-unsubscr...@cassandra.apache.org
            <mailto:user-unsubscr...@cassandra.apache.org>
            For additional commands, e-mail:
            user-h...@cassandra.apache.org
            <mailto:user-h...@cassandra.apache.org>
--
        *בברכה,*

        *יקיר גיברלטר*

        ------------------------------------------------------------------------


        The information in this Internet Email is confidential and may
        be legally privileged. It is intended solely for the
        addressee. Access to this Email by anyone else is
        unauthorized. If you are not the intended recipient, any
        disclosure, copying, distribution or any action taken or
        omitted to be taken in reliance on it, is prohibited and may
        be unlawful. When addressed to our clients any opinions or
        advice contained in this Email are subject to the terms and
        conditions expressed in any applicable governing The Home
        Depot terms of business or client engagement letter. The Home
        Depot disclaims all responsibility and liability for the
        accuracy and content of this attachment and for any damages or
        losses arising from any inaccuracies, errors, viruses, e.g.,
        worms, trojan horses, etc., or other items of a destructive
        nature, which may be contained in this attachment and shall
        not be liable for direct, indirect, consequential or special
        damages in connection with this e-mail message or its attachment.

        Image removed by sender.[avg.com]
        
<https://urldefense.com/v3/__http:/www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient__;!!M-nmYVHPHQ!ZsGwKqKTIhs3ZFvMXTzXUxkppCAiXXZ1sx0fsPypjMFlr3OYsfemtjeZXAJW849AvbtVW-I$>

                

        Virus-free. www.avg.com [avg.com]
        
<https://urldefense.com/v3/__http:/www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient__;!!M-nmYVHPHQ!ZsGwKqKTIhs3ZFvMXTzXUxkppCAiXXZ1sx0fsPypjMFlr3OYsfemtjeZXAJW849AvbtVW-I$>



------------------------------------------------------------------------
The information in this Internet Email is confidential and may belegally privileged. It is intended solely for the addressee. Access tothis Email by anyone else is unauthorized. If you are not the intendedrecipient, any disclosure, copying, distribution or any action takenor omitted to be taken in reliance on it, is prohibited and may beunlawful. When addressed to our clients any opinions or advicecontained in this Email are subject to the terms and conditionsexpressed in any applicable governing The Home Depot terms of businessor client engagement letter. The Home Depot disclaims allresponsibility and liability for the accuracy and content of thisattachment and for any damages or losses arising from anyinaccuracies, errors, viruses, e.g., worms, trojan horses, etc., orother items of a destructive nature, which may be contained in thisattachment and shall not be liable for direct, indirect, consequentialor special damages in connection with this e-mail message or itsattachment.

Re: Node Size

Reply via email to