.....though interestingly, the snapshot of these CFs have the "right" amount of 
data in them (i.e. it agrees with the live SSTable size reported by cassandra). 
Is it total insanity to remove the files from the data directory not included 
in the snapshot, so long as they were created before the snapshot?

On Mar 28, 2013, at 10:54 AM, Hiller, Dean wrote:

> Have you cleaned up your snapshotsÅ those take extra space and don't just
> go away unless you delete them.
> 
> Dean
> 
> On 3/28/13 11:46 AM, "Ben Chobot" <be...@instructure.com> wrote:
> 
>> Are you also running 1.1.5? I'm wondering (ok hoping) that this might be
>> fixed if I upgrade.
>> 
>> On Mar 28, 2013, at 8:53 AM, Lanny Ripple wrote:
>> 
>>> We occasionally (twice now on a 40 node cluster over the last 6-8
>>> months) see this.  My best guess is that Cassandra can fail to mark an
>>> SSTable for cleanup somehow.  Forced GC's or reboots don't clear them
>>> out.  We disable thrift and gossip; drain; snapshot; shutdown; clear
>>> data/Keyspace/Table/*.db and restore (hard-linking back into place to
>>> avoid data transfer) from the just created snapshot; restart.
>>> 
>>> 
>>> On Mar 28, 2013, at 10:12 AM, Ben Chobot <be...@instructure.com> wrote:
>>> 
>>>> Some of my cassandra nodes in my 1.1.5 cluster show a large
>>>> discrepancy between what cassandra says the SSTables should sum up to,
>>>> and what df and du claim exist. During repairs, this is almost always
>>>> pretty bad, but post-repair compactions tend to bring those numbers to
>>>> within a few percent of each other... usually. Sometimes they remain
>>>> much further apart after compactions have finished - for instance, I'm
>>>> looking at one node now that claims to have 205GB of SSTables, but
>>>> actually has 450GB of files living in that CF's data directory. No
>>>> pending compactions, and the most recent compaction for this CF
>>>> finished just a few hours ago.
>>>> 
>>>> nodetool cleanup has no effect.
>>>> 
>>>> What could be causing these extra bytes, and how to get them to go
>>>> away? I'm ok with a few extra GB of unexplained data, but an extra
>>>> 245GB (more than all the data this node is supposed to have!) is a
>>>> little extreme.
>>> 
>> 
> 

Reply via email to