> Yep - I've been looking at these - I don't see anything in iostat/dstat etc
> that point strongly to a problem. There is quite a bit of I/O load, but it
> looks roughly uniform on slow and fast instances of the queries. The last
> compaction ran 4 days ago - which was before I started seeing variable
> performance

[snip]

> I now why it is slow - it's clearly I/O bound. I am trying to hunt down why
> it is sometimes much faster even though I have (tried) to replicate  the
> same conditions

What does clearly I/O bound mean, and what is "quite a bit" of I/O
load? In general, if you have queries that come in at some rate that
is determined by outside sources (rather than by the time the last
query took to execute), you will typically either get more queries
than your cluster can take, or fewer. If fewer, there is a
non-trivially sized grey area where overall I/O throughput needed is
lower than that available, but the closer you are to capacity the more
often requests have to wait for other I/O to complete, for purely
statistical reasons.

If you're running close to maximum capacity, it would be expected that
the variation in query latency is high.

That said, if you're seeing consistently bad latencies for a while
where you sometimes see consistently good latencies, that sounds
different but would hopefully be observable somehow.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Reply via email to