We've encountered the situation that compacted sstable files aren't deleted after node repair. Even when gc is triggered via jmx, it sometimes leaves compacted files. In a case, a lot of files are left. Some files stay more than 10 hours already. There is no guarantee that gc will cleanup all compacted sstable files.
We have a great interest on the following ticket. https://issues.apache.org/jira/browse/CASSANDRA-2521 Regards, Shotaro On Fri, May 27, 2011 at 11:27 AM, Jeffrey Kesselman <jef...@gmail.com> wrote: > Im also not sure that will guarantee all space is cleaned up. It > really depends on what you are doing inside Cassandra. If you have > your on garbage collect that is just in some way tied to the gc run, > then it will run when it runs. > > If otoh you are associating records in your storage with specific > objects in memory and using one of the post-mortem hooks (finalize or > PhantomReference) to tell you to clean up that particular record then > its quite possible they wont all get cleaned up. In general hotspot > does not find and clean every candidate object on every GC run. It > starts with the easiest/fastest to find and then sees what more it > thinks it needs to do to create enough memory for anticipated near > future needs. > > On Thu, May 26, 2011 at 10:16 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> In summary, system.gc works fine unless you've deliberately done >> something like setting the -XX:-DisableExplicitGC flag. >> >> On Thu, May 26, 2011 at 5:58 PM, Konstantin Naryshkin >> <konstant...@a-bb.net> wrote: >>> So, in summary, there is no way to predictably and efficiently tell >>> Cassandra to get rid of all of the extra space it is using on disk? >>> >>> ----- Original Message ----- >>> From: "Jeffrey Kesselman" <jef...@gmail.com> >>> To: user@cassandra.apache.org >>> Sent: Thursday, May 26, 2011 8:57:49 PM >>> Subject: Re: Forcing Cassandra to free up some space >>> >>> Which JVM? Which collector? There have been and continue to be many. >>> >>> Hotspot itself supports a number of different collectors with >>> different behaviors. Many of them do not collect every candidate on >>> every gc, but merely the easiest ones to find. This is why depending >>> on finalizers is a *bad* idea in java code. They may well never get >>> run. (Finalizer is one of a few features the Sun Java team always >>> regretted putting in Java to start with. It has caused quite a few >>> application problems over the years) >>> >>> The really important thing is that NONE of these behaviors of the >>> colelctors are guaranteed by specification not to change from version >>> to version. Basing your code on non-specified behaviors is a good way >>> to hit mysterious failures on updates. >>> >>> For instance, in the mid 90s, IBM had a mode of their Vm called >>> "infinite heap." it *never* garbage collected, even if you called >>> System.gc. Instead it just threw away address space and counted on >>> the total memory needs for the life of the program being less then the >>> total addressable space of the processor. >>> >>> It was *very* fast for certain kinds of applications. >>> >>> Far from being pedantic, not depending on undocumented behavior is >>> simply good engineering. >>> >>> >>> On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>>> I've read the relevant source. While you're pedantically correct re >>>> the spec, you're wrong as to what the JVM actually does. >>>> >>>> On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman <jef...@gmail.com> >>>> wrote: >>>>> Some references... >>>>> >>>>> "An object enters an unreachable state when no more strong references >>>>> to it exist. When an object is unreachable, it is a candidate for >>>>> collection. Note the wording: Just because an object is a candidate >>>>> for collection doesn't mean it will be immediately collected. The JVM >>>>> is free to delay collection until there is an immediate need for the >>>>> memory being consumed by the object." >>>>> >>>>> http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394 >>>>> >>>>> and "Calling the gc method suggests that the Java Virtual Machine >>>>> expend effort toward recycling unused objects" >>>>> >>>>> http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc() >>>>> >>>>> It goes on to say that the VM will make a "best effort", but "best >>>>> effort" is *deliberately* left up to the definition of the gc >>>>> implementor. >>>>> >>>>> I guess you missed the many lectures I have given on this subject over >>>>> the years at Java One Conferences.... >>>>> >>>>> On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>>>>> It's a common misunderstanding that system.gc is only a suggestion; on >>>>>> any VM you're likely to run Cassandra on, System.gc will actually >>>>>> invoke a full collection. >>>>>> >>>>>> On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman <jef...@gmail.com> >>>>>> wrote: >>>>>>> Actually this is no gaurantee. Its a common misunderstanding that >>>>>>> System.gc "forces" gc. It does not. It is a suggestion only. The vm >>>>>>> always >>>>>>> has the option as to when and how much it gcs >>>>>>> >>>>>>> On May 26, 2011 2:51 PM, "Jonathan Ellis" <jbel...@gmail.com> wrote: >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jonathan Ellis >>>>>> Project Chair, Apache Cassandra >>>>>> co-founder of DataStax, the source for professional Cassandra support >>>>>> http://www.datastax.com >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> It's always darkest just before you are eaten by a grue. >>>>> >>>> >>>> >>>> >>>> -- >>>> Jonathan Ellis >>>> Project Chair, Apache Cassandra >>>> co-founder of DataStax, the source for professional Cassandra support >>>> http://www.datastax.com >>>> >>> >>> >>> >>> -- >>> It's always darkest just before you are eaten by a grue. >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > > > -- > It's always darkest just before you are eaten by a grue. > -- Shotaro Kamio