Hi Alain, that was actually a HW issue. The nodes that were behaving badly had a buggy BIOS that was doing some bad things with power management. That all resulted in wrong handling of P-states and CPUs were not going into a full speed. It took a while for me to find it out but now all is fine, we are running at full speed.
Cheers, -Jacek 2016-06-23 9:48 GMT+02:00 Alain RODRIGUEZ <arodr...@gmail.com>: > Hi, > > Sorry no one get back to you yet. Do you still have the issue? > > It's unclear to me what produces this yet. A few ideas though: > >> We are quite pedantic about OS settings. All nodes got same settings >> and C* configuration. > > > Considering this hypothesis, I hope that's 100% true. > > 2 nodes behaving badly out of 6, makes me think of an unbalanced cluster. Do > you use RF=2 there ? do you have wide rows or unbalanced data (partition > keys not well distributes)? > > Could you check and paste the output from nodetool cfstats and nodetool > cfhistograms on the most impacting tables ? > > Could those nodes have hardware issues of some kind ? > > C*heers, > ----------------------- > Alain Rodriguez - al...@thelastpickle.com > France > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > > > 2016-06-02 13:43 GMT+02:00 Jacek Luczak <difrost.ker...@gmail.com>: >> >> Hi, >> >> I've got a 6 node C* cluster (all nodes are equal both in OS and HW >> setup, they are DL380 Gen9 with Smart Array RAID 50,3 on SAS 15K HDDs) >> which has been recently updated from 2.2.5 to 3.5. As part of the >> update I've done the upgradesstables. >> >> On 4 nodes the average request size issued to the block dev was never >> higher than 8 (that maps to 4K reads) while on remaining 2 nodes it >> was basically always maxed 512 (256K reads). >> >> Nodes doing 4K reads were pumping max 2K read IOPs while the 2 nodes >> never went up above 30 IOPs. >> >> We are quite pedantic about OS settings. All nodes got same settings >> and C* configuration. On all nodes block dev got noop scheduler set >> and read ahead aligned with strip size. >> >> During heavy read workloads we've also noticed that those 4 nodes can >> swing up to 10K IOPs to get data from storage, the 2 are much below. >> >> What can cause such difference? >> >> -Jacek > >