Hi Alain,

that was actually a HW issue. The nodes that were behaving badly had a
buggy BIOS that was doing some bad things with power management. That
all resulted in wrong handling of P-states and CPUs were not going
into a full speed. It took a while for me to find it out but now all
is fine, we are running at full speed.

Cheers,
-Jacek

2016-06-23 9:48 GMT+02:00 Alain RODRIGUEZ <arodr...@gmail.com>:
> Hi,
>
> Sorry no one get back to you yet. Do you still have the issue?
>
> It's unclear to me what produces this yet. A few ideas though:
>
>> We are quite pedantic about OS settings. All nodes got same settings
>> and C* configuration.
>
>
> Considering this hypothesis, I hope that's 100% true.
>
> 2 nodes behaving badly out of 6, makes me think of an unbalanced cluster. Do
> you use RF=2 there ? do you have wide rows or unbalanced data (partition
> keys not well distributes)?
>
> Could you check and paste the output from nodetool cfstats and nodetool
> cfhistograms on the most impacting tables ?
>
> Could those nodes have hardware issues of some kind ?
>
> C*heers,
> -----------------------
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> 2016-06-02 13:43 GMT+02:00 Jacek Luczak <difrost.ker...@gmail.com>:
>>
>> Hi,
>>
>> I've got a 6 node C* cluster (all nodes are equal both in OS and HW
>> setup, they are DL380 Gen9 with Smart Array RAID 50,3 on SAS 15K HDDs)
>> which has been recently updated from 2.2.5 to 3.5. As part of the
>> update I've done the upgradesstables.
>>
>> On 4 nodes the average request size issued to the block dev was never
>> higher than 8 (that maps to 4K reads) while on remaining 2 nodes it
>> was basically always maxed 512 (256K reads).
>>
>> Nodes doing 4K reads were pumping max 2K read IOPs while the 2 nodes
>> never went up above 30 IOPs.
>>
>> We are quite pedantic about OS settings. All nodes got same settings
>> and C* configuration. On all nodes block dev got noop scheduler set
>> and read ahead aligned with strip size.
>>
>> During heavy read workloads we've also noticed that those 4 nodes can
>> swing up to 10K IOPs to get data from storage, the 2 are much below.
>>
>> What can cause such difference?
>>
>> -Jacek
>
>

Reply via email to