Re: Is my cluster normal?

Jonathan Haddad Thu, 07 Jul 2016 18:54:31 -0700

What's your CPU looking like? If it's low, check your IO with iostat or
dstat. I know some people have used Ebs and say it's fine but ive been
burned too many times.
On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <[email protected]> wrote:


> Hi Riccardo,
>
> Very low IO-wait. About 0.3%.
> No stolen CPU. It is a casssandra only instance. I did not see any dropped
> messages.
>
>
> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked
>  All time blocked
> MutationStage                     1         1      929509244         0
>             0
> ViewMutationStage                 0         0              0         0
>             0
> ReadStage                         4         0        4021570         0
>             0
> RequestResponseStage              0         0      731477999         0
>             0
> ReadRepairStage                   0         0         165603         0
>             0
> CounterMutationStage              0         0              0         0
>             0
> MiscStage                         0         0              0         0
>             0
> CompactionExecutor                2        55          92022         0
>             0
> MemtableReclaimMemory             0         0           1736         0
>             0
> PendingRangeCalculator            0         0              6         0
>             0
> GossipStage                       0         0         345474         0
>             0
> SecondaryIndexManagement          0         0              0         0
>             0
> HintsDispatcher                   0         0              4         0
>             0
> MigrationStage                    0         0             35         0
>             0
> MemtablePostFlush                 0         0           1973         0
>             0
> ValidationExecutor                0         0              0         0
>             0
> Sampler                           0         0              0         0
>             0
> MemtableFlushWriter               0         0           1736         0
>             0
> InternalResponseStage             0         0           5311         0
>             0
> AntiEntropyStage                  0         0              0         0
>             0
> CacheCleanupExecutor              0         0              0         0
>             0
> Native-Transport-Requests       128       128      347508531         2
>      15891862
>
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> HINT                         0
> MUTATION                     0
> COUNTER_MUTATION             0
> BATCH_STORE                  0
> BATCH_REMOVE                 0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
>
>
>
>
>
> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <[email protected]>
> wrote:
>
>> Hi Yuan,
>>
>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
>> from any Cassandra specific discussion a system load of 10 on a 4 threads
>> machine is way too much in my opinion. If that is the running average
>> system load I would look deeper into system details. Is that IO wait? Is
>> that CPU Stolen? Is that a Cassandra only instance or are there other
>> processes pushing the load?
>> What does your "nodetool tpstats" say? Hoe many dropped messages do you
>> have?
>>
>> Best,
>>
>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <[email protected]> wrote:
>>
>>> Thanks Ben! For the post, it seems they got a little better but similar
>>> result than i did. Good to know it.
>>> I am not sure if a little fine tuning of heap memory will help or not.
>>>
>>>
>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <[email protected]>
>>> wrote:
>>>
>>>> Hi Yuan,
>>>>
>>>> You might find this blog post a useful comparison:
>>>>
>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>>>
>>>> Although the focus is on Spark and Cassandra and multi-DC there are
>>>> also some single DC benchmarks of m4.xl clusters plus some discussion of
>>>> how we went about benchmarking.
>>>>
>>>> Cheers
>>>> Ben
>>>>
>>>>
>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <[email protected]> wrote:
>>>>
>>>>> Yes, here is my stress test result:
>>>>> Results:
>>>>> op rate                   : 12200 [WRITE:12200]
>>>>> partition rate            : 12200 [WRITE:12200]
>>>>> row rate                  : 12200 [WRITE:12200]
>>>>> latency mean              : 16.4 [WRITE:16.4]
>>>>> latency median            : 7.1 [WRITE:7.1]
>>>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>>> latency max               : 1408.4 [WRITE:1408.4]
>>>>> Total partitions          : 1000000 [WRITE:1000000]
>>>>> Total errors              : 0 [WRITE:0]
>>>>> total gc count            : 0
>>>>> total gc mb               : 0
>>>>> total gc time (s)         : 0
>>>>> avg gc time(ms)           : NaN
>>>>> stdev gc time(ms)         : 0
>>>>> Total operation time      : 00:01:21
>>>>> END
>>>>>
>>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <[email protected]> wrote:
>>>>>
>>>>>> Lots of variables you're leaving out.
>>>>>>
>>>>>> Depends on write size, if you're using logged batch or not, what
>>>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>>>> However, that's all sort of moot for determining "normal" really you 
>>>>>> need a
>>>>>> baseline as all those variables end up mattering a huge amount.
>>>>>>
>>>>>> I would suggest using Cassandra stress as a baseline and go from
>>>>>> there depending on what those numbers say (just pick the defaults).
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <[email protected]> wrote:
>>>>>>
>>>>>> yes, it is about 8k writes per node.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>>>>
>>>>>>>
>>>>>>> *.......*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>>
>>>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> writes 30k/second is the main thing.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>>>>>> storage (probably way small) where the data is more that 64k hence 
>>>>>>>>> will not
>>>>>>>>> fit into the row cache.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *.......*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>>>>
>>>>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>>>>>> 600GB ssd EBS).
>>>>>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>>>>>> request about 100/second. The cluster OS load constantly above 10. 
>>>>>>>>>> Are
>>>>>>>>>> those normal?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Yuan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>> ————————
>>>> Ben Slater
>>>> Chief Product Officer
>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>> +61 437 929 798
>>>>
>>>
>>>
>>
>

Re: Is my cluster normal?

Reply via email to