> This is what I thought. I was wishing there might be another way to reclaim > the space.
Be sure you really need this first :) Normally you just let it happen in the bg. > The problem is that the more data you have the more time it will take to > Cassandra to response. Relative to what though? There are definitely important side-effects of having very large data sets, and part of that involves compactions, but in a normal steady state type of system you should never be in the position to "wait" for a major compaction to run. Compactions are something that is intended to run every now and then in the background. It will result in variations in disk space within certain bounds, which is expected. Certainly the situation can be improved and the current disk space utilization situation is not perfect, but the above suggests to me that you're trying to do something that is not really intended to be done. > Reclaim space of deleted rows in the biggest SSTable requires Major > compaction. This compaction can be triggered by adding x2 data (or x4 data > in the default configuration) to the system or by executing it manually > using JMX. You can indeed choose to trigger major compactions by e.g. cron jobs. But just be aware that if you're operating under conditions where you are close to disk space running out, you have other concerns too - such as periodic repair operations also needing disk space. Also; suppose you're overwriting lots of data (or replacing by deleting and adding other data). It is not necessarily true that you need 4x the space relative to what you otherwise do just because of the compaction threshold. Keep in mind that compactions already need extra space anyway. If you're *not* overwriting or adding data, a compaction of a single CF is expected to need up to twice the amount of space that it occupies. If you're doing more overwrites and deletions though, as you point out you will have more "dead" data at any given point in time. But on the other hand, the peak disk space usage during compactions is lower. So the actual peak disk space usage (which is what matters since you must have this much disk space) is actually helped by the deletions/overwrites too. Further, suppose you trigger major compactions more often. That means each compaction will have a higher relative spike of disk usage because less data has had time to be overwritten or removed. So in a sense, it's like the disk space demands is being moved between the category of "dead data retained for longer than necessary" and "peak disk usage during compaction". Also keep in mind that the *low* peak of disk space usage is not subject to any fragmentation concerns. Depending on the size of your data compared to e.g. column names, that disk space usage might be significantly lower than what you would get with an in-place updating database. There are lots of trade-offs :) You say you have to "wait" for deletions though which sounds like you're doing something unusual. Are you doing stuff like deleting lots of data in bulk from one CF, only to then write data to *another* CF? Such that you're actually having to wait for disk space to be freed to make room for data somewhere else? > In case of a system that deletes data regularly, which needs to serve > customers all day and the time it takes should be in ms, this is a problem. Not in general. I am afraid there may be some misunderstanding here. Unless disk space is a problem for you (i.e., you're running out of space), there is no need to wait for compactions. And certainly whether you can serve traffic 24/7 at low-ms latencies is an important consideration, and does become complex when disk I/O is involved, but it is not about disk *space*. If you have important performance requirements, make sure you can service the read load at all given your data set size. If you're runnning out of disk, I presume your data is big. See http://wiki.apache.org/cassandra/LargeDataSetConsiderations Perhaps if you can describe your situation in more detail? > It appears to me that in order to use Cassandra you must have a process that > will trigger major compaction on the nodes once in X amount of time. For some cases this will be beneficial, but not always. It's been further improved for 0.7 too w.r.t. tomb stone handling in non-major compactions (I don't have the JIRA ticket number handy). It's certainly not a hard requirement and would only ever be relevant if you're operating nodes that are significantly full. > One case where you would do that is when you don't (or hardly) delete data. Or just in most cases where you don't push disk space concerns. > Another one is when your upper limit of time it should take to response is > very high so major compaction will not hurt you. To be really clear: Compaction is a background operation. It is never the case that reads or writes somehow "wait" for compaction to complete. -- / Peter Schuller