On Mon, Feb 13, 2012 at 8:09 PM, Peter Schuller <peter.schul...@infidyne.com > wrote:
> > the servers spending >50% of the time in io-wait > > Note that I/O wait is not necessarily a good indicator, depending on > situation. In particular if you have multiple drives, I/O wait can > mostly be ignored. Similarly if you have non-trivial CPU usage in > addition to disk I/O, it is also not a good indicator. I/O wait is > essentially giving you the amount of time CPU:s spend doing nothing > because the only processes that would otherwise be runnable are > waiting on disk I/O. But even a single process waiting on disk I/O -> > lots of I/O wait even if you have 24 drives. > Yep - user space cpu is <20% or much worse when the io-wait goes in to the 90's - looks a great deal like IO bottleknecks > > The per-disk % utilization is generally a much better indicator > (assuming no hardware raid device, and assuming no SSD), along with > the average queue size. > I doubt that figure is available sensibly in an ec2 instance > > >> In general, if you have queries that come in at some rate that > >> is determined by outside sources (rather than by the time the last > >> query took to execute), > > > > That's an interesting approach - is that likely to give close to optimal > > performance ? > > I just mean that it all depends on the situation. If you have, for > example, some N number of clients that are doing work as fast as they > can, bottlenecking only on Cassandra, you're essentially saturating > the Cassandra cluster no matter what (until the client/network becomes > a bottleneck). Under such conditions (saturation) you generally never > should expect good latencies. > > For most non-batch job production use-cases, you tend to have incoming > requests driven by something external such as user behavior or > automated systems not related to the Cassandra cluster. In this cases, > you tend to have a certain amount of incoming requests at any given > time that you must serve within a reasonable time frame, and that's > where the question comes in of how much I/O you're doing in relation > to maximum. For good latencies, you always want to be significantly > below maximum - particularly when platter based disk I/O is involved. > > > That may well explain it - I'll have to think about what that means for > our > > use case as load will be extremely bursty > > To be clear though, even your typical un-bursty load is still bursty > once you look at it at sufficient resolution, unless you have > something specifically ensuring that it is entirely smooth. A > completely random distribution over time for example would look very > even on almost any graph you can imagine unless you have sub-second > resolution, but would still exhibit un-evenness and have an affect on > latency. > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com) > -- *Franc Carter* | Systems architect | Sirca Ltd <marc.zianideferra...@sirca.org.au> franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215