I have done some bulk write performance tests and I saw background compaction making a big detrimental impact on the write performance. I was also wondering if there is a tunable to limit the frequency of the compaction on the sstables. If not, then adding such a configuration option would also help in controlling the performance impact of compaction operation.
-Rishi ________________________________ From: Peter Schuller <peter.schul...@infidyne.com> To: dev@cassandra.apache.org Sent: Wed, July 7, 2010 10:09:25 AM Subject: Minimizing the impact of compaction on latency and throughput Hello, I have repeatedly seen users report that background compaction is overly detrimental to the behavior of the node with respect to latency. While I have not yet deployed cassandra in a production situation where latencies are closely monitored, these reports do not really sound very surprising to me given the nature of compaction and unless otherwise stated by developers here on the list I tend to believe that it is a real issue. Ignoring implementation difficulties for a moment, a few things that could improve the situation, that seem sensible to me, are: * Utilizing posix_fadvise() on both reads and writes to avoid obliterating the operating system's caching of the sstables. * Add the ability to rate limit disk I/O (in particular writes). * Add the ability to perform direct I/O. * Add the ability to fsync() regularly on writes to force the operating system to not decide to flush hundreds of megabytes of data out in a single burst. * (Not an improvement but general observation: it seems useless for writes to the commit log to remain in cache after an fsync(), and so they are a good candidate for posix_fadvise()) None of these would be silver bullets, and the importance and appropriate settings for each would be very dependent on operating system, hardware, etc. But having the ability to control some or all of these should, I suspect, allow significantly lessening the impact of compaction under a variety of circumstances. With respect to cache eviction, the this is one area where the impact can probably be expected to be higher the more you rely on the operating systems caching, and the less you rely on in-JVM caching done by cassandra. The most obvious problem points to me include: * posix_fadvise() and direct I/O cause portability and building issues, necessitating native code. * rate limiting is very indirect due to read-ahead, caching, etc. in particular for writes, rate limiting them would likely be almost useless without also having fsync() or direct I/O, unless it is rate limited to an extremely small amount and the cluster is taking very few writes (such that the typical background flushing done by most OS:es is done often enough to not imply huge amounts of data) Any thoughts? Has this already been considered and rejected? Do you think compaction is in fact not a problem already? Are there other, easier, better ways to accomplish the goal? -- / Peter Schuller