> the purpose of your thread is: How far are you away from being I/O
> bound (say in terms of % utilization - last column of iostat -x 1 -
> assuming you don't have a massive RAID underneath the block device)

No my cheap boss didn't want to by me a stack of these 
http://www.ocztechnology.com/products/solid-state-drives/pci-express/z-drive-r2/mlc-performance-series/ocz-z-drive-r2-p88-pci-express-ssd.html

But seriously: we don't know yet what the best way in terms of TCO is. Maybe 
its worth investing 2k in SSDs if that machine could than handle the load of 3.


> when compaction/AESis *not* running? I.e., how much in relative terms,
> in terms of "time spent by disks servicing requests" is added by
> compaction/AES?
> 

Can't really say in terms of util% because we only monitor IO waits in zabbix. 
Now with our cluster running smoothly I'd say compactions adds around 15-20%.
In terms of IO waits we saw our graphs jumped during compactions

- from 20 - 30% to 50%  with 'ok' load (reqs where handled at around 100ms max 
and no messages dropped) and
- from 50% - 80/90% during peak hours. Things got ugly then

> Are your values in generally largish (say a few kb or some such)or
> very small (5-50 bytes) or somewhere in between? I've been trying to
> collect information when people report compaction/repair killing their
> performance. My hypothesis is that most sever issues are for data sets
> where compaction becomes I/O bound rather than CPU bound (for those
> that have seen me say this a gazillion times I must be sounding like
> I'm a stuck LP record); and this would tend to be expected with larger
> and fewer values as opposed to smaller and more numerous values as the
> latter is much more expensive in terms of CPU cycles per byte
> compacted. Further I expect CPU bound compaction to be a problem very
> infrequently in comparison. I'm trying to confirm or falsify the
> hypothesis.

Well we have 4 CFs with different characteristics but it seems that what made 
things go wrong was a CF with ~2k cols. I have never seen CPU user time over 
30% on any of the nodes. So I second your hypothesis

> 
> -- 
> / Peter Schuller

Reply via email to