> the purpose of your thread is: How far are you away from being I/O > bound (say in terms of % utilization - last column of iostat -x 1 - > assuming you don't have a massive RAID underneath the block device)
No my cheap boss didn't want to by me a stack of these http://www.ocztechnology.com/products/solid-state-drives/pci-express/z-drive-r2/mlc-performance-series/ocz-z-drive-r2-p88-pci-express-ssd.html But seriously: we don't know yet what the best way in terms of TCO is. Maybe its worth investing 2k in SSDs if that machine could than handle the load of 3. > when compaction/AESis *not* running? I.e., how much in relative terms, > in terms of "time spent by disks servicing requests" is added by > compaction/AES? > Can't really say in terms of util% because we only monitor IO waits in zabbix. Now with our cluster running smoothly I'd say compactions adds around 15-20%. In terms of IO waits we saw our graphs jumped during compactions - from 20 - 30% to 50% with 'ok' load (reqs where handled at around 100ms max and no messages dropped) and - from 50% - 80/90% during peak hours. Things got ugly then > Are your values in generally largish (say a few kb or some such)or > very small (5-50 bytes) or somewhere in between? I've been trying to > collect information when people report compaction/repair killing their > performance. My hypothesis is that most sever issues are for data sets > where compaction becomes I/O bound rather than CPU bound (for those > that have seen me say this a gazillion times I must be sounding like > I'm a stuck LP record); and this would tend to be expected with larger > and fewer values as opposed to smaller and more numerous values as the > latter is much more expensive in terms of CPU cycles per byte > compacted. Further I expect CPU bound compaction to be a problem very > infrequently in comparison. I'm trying to confirm or falsify the > hypothesis. Well we have 4 CFs with different characteristics but it seems that what made things go wrong was a CF with ~2k cols. I have never seen CPU user time over 30% on any of the nodes. So I second your hypothesis > > -- > / Peter Schuller