This makes sense, but from what I have seen, read contention vs
cassandra is a much bigger deal than write contention (unless you
don't have a separate device for your commitlog, but optimizing for
that case isn't one of our goals).

On Wed, Jul 7, 2010 at 12:09 PM, Peter Schuller
<peter.schul...@infidyne.com> wrote:
> Hello,
>
> I have repeatedly seen users report that background compaction is
> overly detrimental to the behavior of the node with respect to
> latency. While I have not yet deployed cassandra in a production
> situation where latencies are closely monitored, these reports do not
> really sound very surprising to me given the nature of compaction and
> unless otherwise stated by developers here on the list I tend to
> believe that it is a real issue.
>
> Ignoring implementation difficulties for a moment, a few things that
> could improve the situation, that seem sensible to me, are:
>
> * Utilizing posix_fadvise() on both reads and writes to avoid
> obliterating the operating system's caching of the sstables.
> * Add the ability to rate limit disk I/O (in particular writes).
> * Add the ability to perform direct I/O.
> * Add the ability to fsync() regularly on writes to force the
> operating system to not decide to flush hundreds of megabytes of data
> out in a single burst.
> * (Not an improvement but general observation: it seems useless for
> writes to the commit log to remain in cache after an fsync(), and so
> they are a good candidate for posix_fadvise())
>
> None of these would be silver bullets, and the importance and
> appropriate settings for each would be very dependent on operating
> system, hardware, etc. But having the ability to control some or all
> of these should, I suspect, allow significantly lessening the impact
> of compaction under a variety of circumstances.
>
> With respect to cache eviction, the this is one area where the impact
> can probably be expected to be higher the more you rely on the
> operating systems caching, and the less you rely on in-JVM caching
> done by cassandra.
>
> The most obvious problem points to me include:
>
> * posix_fadvise() and direct I/O cause portability and building
> issues, necessitating native code.
> * rate limiting is very indirect due to read-ahead, caching, etc. in
> particular for writes, rate limiting them would likely be almost
> useless without also having fsync() or direct I/O, unless it is rate
> limited to an extremely small amount and the cluster is taking very
> few writes (such that the typical background flushing done by most
> OS:es is done often enough to not imply huge amounts of data)
>
> Any thoughts? Has this already been considered and rejected? Do you
> think compaction is in fact not a problem already? Are there other,
> easier, better ways to accomplish the goal?
>
> --
> / Peter Schuller
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Reply via email to