Re: Is my cluster normal?

Yuan Fang Tue, 12 Jul 2016 11:19:36 -0700

Hi Jonathan,

The IOs are like below. I am not sure why one node always has a much bigger
KB_read/s than other nodes. It seems not good.



==============
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          54.78   24.48    9.35    0.96    0.08   10.35

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              2.31        14.64        17.95    1415348    1734856
xvdf            252.68     11789.51      6394.15 1139459318  617996710

=============

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.71    6.57    3.96    0.50    0.19   66.07

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              1.12         3.63        10.59    3993540   11648848
xvdf             68.20       923.51      2526.86 1016095212 2780187819

===============
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.31    8.08    3.70    0.26    0.23   65.42

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              1.07         2.87        10.89    3153996   11976704
xvdf             34.48       498.21      2293.70  547844196 2522227746

================
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.75    8.13    3.82    0.36    0.21   64.73

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              1.10         3.20        11.33    3515752   12442344
xvdf             44.45       474.30      2511.71  520758840 2757732583






On Thu, Jul 7, 2016 at 6:54 PM, Jonathan Haddad <[email protected]> wrote:

> What's your CPU looking like? If it's low, check your IO with iostat or
> dstat. I know some people have used Ebs and say it's fine but ive been
> burned too many times.
> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <[email protected]> wrote:
>
>> Hi Riccardo,
>>
>> Very low IO-wait. About 0.3%.
>> No stolen CPU. It is a casssandra only instance. I did not see any
>> dropped messages.
>>
>>
>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>> Pool Name                    Active   Pending      Completed   Blocked
>>  All time blocked
>> MutationStage                     1         1      929509244         0
>>               0
>> ViewMutationStage                 0         0              0         0
>>               0
>> ReadStage                         4         0        4021570         0
>>               0
>> RequestResponseStage              0         0      731477999         0
>>               0
>> ReadRepairStage                   0         0         165603         0
>>               0
>> CounterMutationStage              0         0              0         0
>>               0
>> MiscStage                         0         0              0         0
>>               0
>> CompactionExecutor                2        55          92022         0
>>               0
>> MemtableReclaimMemory             0         0           1736         0
>>               0
>> PendingRangeCalculator            0         0              6         0
>>               0
>> GossipStage                       0         0         345474         0
>>               0
>> SecondaryIndexManagement          0         0              0         0
>>               0
>> HintsDispatcher                   0         0              4         0
>>               0
>> MigrationStage                    0         0             35         0
>>               0
>> MemtablePostFlush                 0         0           1973         0
>>               0
>> ValidationExecutor                0         0              0         0
>>               0
>> Sampler                           0         0              0         0
>>               0
>> MemtableFlushWriter               0         0           1736         0
>>               0
>> InternalResponseStage             0         0           5311         0
>>               0
>> AntiEntropyStage                  0         0              0         0
>>               0
>> CacheCleanupExecutor              0         0              0         0
>>               0
>> Native-Transport-Requests       128       128      347508531         2
>>        15891862
>>
>> Message type           Dropped
>> READ                         0
>> RANGE_SLICE                  0
>> _TRACE                       0
>> HINT                         0
>> MUTATION                     0
>> COUNTER_MUTATION             0
>> BATCH_STORE                  0
>> BATCH_REMOVE                 0
>> REQUEST_RESPONSE             0
>> PAGED_RANGE                  0
>> READ_REPAIR                  0
>>
>>
>>
>>
>>
>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <[email protected]>
>> wrote:
>>
>>> Hi Yuan,
>>>
>>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
>>> from any Cassandra specific discussion a system load of 10 on a 4 threads
>>> machine is way too much in my opinion. If that is the running average
>>> system load I would look deeper into system details. Is that IO wait? Is
>>> that CPU Stolen? Is that a Cassandra only instance or are there other
>>> processes pushing the load?
>>> What does your "nodetool tpstats" say? Hoe many dropped messages do you
>>> have?
>>>
>>> Best,
>>>
>>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <[email protected]>
>>> wrote:
>>>
>>>> Thanks Ben! For the post, it seems they got a little better but similar
>>>> result than i did. Good to know it.
>>>> I am not sure if a little fine tuning of heap memory will help or not.
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Yuan,
>>>>>
>>>>> You might find this blog post a useful comparison:
>>>>>
>>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>>>>>
>>>>> Although the focus is on Spark and Cassandra and multi-DC there are
>>>>> also some single DC benchmarks of m4.xl clusters plus some discussion of
>>>>> how we went about benchmarking.
>>>>>
>>>>> Cheers
>>>>> Ben
>>>>>
>>>>>
>>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <[email protected]> wrote:
>>>>>
>>>>>> Yes, here is my stress test result:
>>>>>> Results:
>>>>>> op rate                   : 12200 [WRITE:12200]
>>>>>> partition rate            : 12200 [WRITE:12200]
>>>>>> row rate                  : 12200 [WRITE:12200]
>>>>>> latency mean              : 16.4 [WRITE:16.4]
>>>>>> latency median            : 7.1 [WRITE:7.1]
>>>>>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>>>>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>>>>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>>>>> latency max               : 1408.4 [WRITE:1408.4]
>>>>>> Total partitions          : 1000000 [WRITE:1000000]
>>>>>> Total errors              : 0 [WRITE:0]
>>>>>> total gc count            : 0
>>>>>> total gc mb               : 0
>>>>>> total gc time (s)         : 0
>>>>>> avg gc time(ms)           : NaN
>>>>>> stdev gc time(ms)         : 0
>>>>>> Total operation time      : 00:01:21
>>>>>> END
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <[email protected]> wrote:
>>>>>>
>>>>>>> Lots of variables you're leaving out.
>>>>>>>
>>>>>>> Depends on write size, if you're using logged batch or not, what
>>>>>>> consistency level, what RF, if the writes come in bursts, etc, etc.
>>>>>>> However, that's all sort of moot for determining "normal" really you 
>>>>>>> need a
>>>>>>> baseline as all those variables end up mattering a huge amount.
>>>>>>>
>>>>>>> I would suggest using Cassandra stress as a baseline and go from
>>>>>>> there depending on what those numbers say (just pick the defaults).
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <[email protected]> wrote:
>>>>>>>
>>>>>>> yes, it is about 8k writes per node.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>>>>>>
>>>>>>>>
>>>>>>>> *.......*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>>>
>>>>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> writes 30k/second is the main thing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>>>>>>> storage (probably way small) where the data is more that 64k hence 
>>>>>>>>>> will not
>>>>>>>>>> fit into the row cache.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *.......*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory
>>>>>>>>>>> and 600GB ssd EBS).
>>>>>>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>>>>>>> request about 100/second. The cluster OS load constantly above 10. 
>>>>>>>>>>> Are
>>>>>>>>>>> those normal?
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Yuan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>> ————————
>>>>> Ben Slater
>>>>> Chief Product Officer
>>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>>>> +61 437 929 798
>>>>>
>>>>
>>>>
>>>
>>

Re: Is my cluster normal?

Reply via email to