This makes sense, but from what I have seen, read contention vs cassandra is a much bigger deal than write contention (unless you don't have a separate device for your commitlog, but optimizing for that case isn't one of our goals).
On Wed, Jul 7, 2010 at 12:09 PM, Peter Schuller <peter.schul...@infidyne.com> wrote: > Hello, > > I have repeatedly seen users report that background compaction is > overly detrimental to the behavior of the node with respect to > latency. While I have not yet deployed cassandra in a production > situation where latencies are closely monitored, these reports do not > really sound very surprising to me given the nature of compaction and > unless otherwise stated by developers here on the list I tend to > believe that it is a real issue. > > Ignoring implementation difficulties for a moment, a few things that > could improve the situation, that seem sensible to me, are: > > * Utilizing posix_fadvise() on both reads and writes to avoid > obliterating the operating system's caching of the sstables. > * Add the ability to rate limit disk I/O (in particular writes). > * Add the ability to perform direct I/O. > * Add the ability to fsync() regularly on writes to force the > operating system to not decide to flush hundreds of megabytes of data > out in a single burst. > * (Not an improvement but general observation: it seems useless for > writes to the commit log to remain in cache after an fsync(), and so > they are a good candidate for posix_fadvise()) > > None of these would be silver bullets, and the importance and > appropriate settings for each would be very dependent on operating > system, hardware, etc. But having the ability to control some or all > of these should, I suspect, allow significantly lessening the impact > of compaction under a variety of circumstances. > > With respect to cache eviction, the this is one area where the impact > can probably be expected to be higher the more you rely on the > operating systems caching, and the less you rely on in-JVM caching > done by cassandra. > > The most obvious problem points to me include: > > * posix_fadvise() and direct I/O cause portability and building > issues, necessitating native code. > * rate limiting is very indirect due to read-ahead, caching, etc. in > particular for writes, rate limiting them would likely be almost > useless without also having fsync() or direct I/O, unless it is rate > limited to an extremely small amount and the cluster is taking very > few writes (such that the typical background flushing done by most > OS:es is done often enough to not imply huge amounts of data) > > Any thoughts? Has this already been considered and rejected? Do you > think compaction is in fact not a problem already? Are there other, > easier, better ways to accomplish the goal? > > -- > / Peter Schuller > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com