On Thu, Feb 16, 2017 at 12:38 AM, Benjamin Roth <benjamin.r...@jaumo.com> wrote:
> It doesn't really look like that: > https://cl.ly/2c3Z1u2k0u2I > > Thats the ReadLatency.count metric aggregated by host which represents the > actual read operations, correct? > > 2017-02-15 23:01 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>: > >> I think it has more than double the load. It is double the data. More >> read repair chances. More load can swing it's way during node failures etc. >> >> On Wednesday, February 15, 2017, Benjamin Roth <benjamin.r...@jaumo.com> >> wrote: >> >>> Hi there, >>> >>> Following situation in cluster with 10 nodes: >>> Node A's disk read IO is ~20 times higher than the read load of node B. >>> The nodes are exactly the same except: >>> - Node A has 512 tokens and Node B 256. So it has double the load (data). >>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load) >>> >>> Node A has roughly 460GB, Node B 260GB total disk usage. >>> Both nodes have 128GB RAM and 40 cores. >>> >>> Of course I assumed that Node A does more reads because cache / load >>> ratio is worse but a factor of 20 makes me very sceptic. >>> >>> Of course Node A has a much higher and less predictable latency due to >>> the wait states. >>> >>> Has anybody experienced similar situations? >>> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB >>> payload is not that few. I am pretty sure that not the whole dataset of >>> 460GB is "hot". >>> >>> -- >>> Benjamin Roth >>> Prokurist >>> >>> Jaumo GmbH · www.jaumo.com >>> Wehrstraße 46 · 73035 Göppingen · Germany >>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1 >>> <07161%203048801> >>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >>> >> >> >> -- >> Sorry this was sent from mobile. Will do less grammar and spell check >> than usual. >> > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH · www.jaumo.com > Wehrstraße 46 · 73035 Göppingen · Germany > Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1 > <+49%207161%203048801> > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer > You could be correct. I also think a few things smooth out the curves. - Intelligent clients - Dynamic snitch For example when testing out a an awesome JVM tune, you might see CPU usage go down. From there you assume the tune worked, but what can happen is the two dynamic mechanisms shift some small% of traffic away. Those affects cascade as well. dynamic_snitch claims to shift load once performance is $threshold worse.