Glad to hear, Thanks for dropping this here for the record ;-).
C*heers, ----------------------- Alain Rodriguez - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-09-12 17:15 GMT+02:00 Jacek Luczak <difrost.ker...@gmail.com>: > Hi Alain, > > that was actually a HW issue. The nodes that were behaving badly had a > buggy BIOS that was doing some bad things with power management. That > all resulted in wrong handling of P-states and CPUs were not going > into a full speed. It took a while for me to find it out but now all > is fine, we are running at full speed. > > Cheers, > -Jacek > > 2016-06-23 9:48 GMT+02:00 Alain RODRIGUEZ <arodr...@gmail.com>: > > Hi, > > > > Sorry no one get back to you yet. Do you still have the issue? > > > > It's unclear to me what produces this yet. A few ideas though: > > > >> We are quite pedantic about OS settings. All nodes got same settings > >> and C* configuration. > > > > > > Considering this hypothesis, I hope that's 100% true. > > > > 2 nodes behaving badly out of 6, makes me think of an unbalanced > cluster. Do > > you use RF=2 there ? do you have wide rows or unbalanced data (partition > > keys not well distributes)? > > > > Could you check and paste the output from nodetool cfstats and nodetool > > cfhistograms on the most impacting tables ? > > > > Could those nodes have hardware issues of some kind ? > > > > C*heers, > > ----------------------- > > Alain Rodriguez - al...@thelastpickle.com > > France > > > > The Last Pickle - Apache Cassandra Consulting > > http://www.thelastpickle.com > > > > > > > > 2016-06-02 13:43 GMT+02:00 Jacek Luczak <difrost.ker...@gmail.com>: > >> > >> Hi, > >> > >> I've got a 6 node C* cluster (all nodes are equal both in OS and HW > >> setup, they are DL380 Gen9 with Smart Array RAID 50,3 on SAS 15K HDDs) > >> which has been recently updated from 2.2.5 to 3.5. As part of the > >> update I've done the upgradesstables. > >> > >> On 4 nodes the average request size issued to the block dev was never > >> higher than 8 (that maps to 4K reads) while on remaining 2 nodes it > >> was basically always maxed 512 (256K reads). > >> > >> Nodes doing 4K reads were pumping max 2K read IOPs while the 2 nodes > >> never went up above 30 IOPs. > >> > >> We are quite pedantic about OS settings. All nodes got same settings > >> and C* configuration. On all nodes block dev got noop scheduler set > >> and read ahead aligned with strip size. > >> > >> During heavy read workloads we've also noticed that those 4 nodes can > >> swing up to 10K IOPs to get data from storage, the 2 are much below. > >> > >> What can cause such difference? > >> > >> -Jacek > > > > >