Hi Ben,
If affordable, just blow away the node and bootstrap in a replacement/ or 
restore from snapshot and repair. 

-Wei

----- Original Message -----
From: "Dean Hiller" <dean.hil...@nrel.gov>
To: user@cassandra.apache.org
Sent: Thursday, March 28, 2013 11:40:21 AM
Subject: Re: lots of extra bytes on disk

Oh and since our LCS was 10MB per file it was easy to tell which files did
not convert yet.  Also, we ended up blowing away a CF on node 5(of 6) and
running a full repair on that CF and after he was at a normal size again
as well.

Dean

On 3/28/13 12:35 PM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote:

>We had a runaway STCS like this due to our own mistakes but were not sure
>how to clean it up.  We went to LCS instead of STCS and that seemed to
>bring it way back down since the STCS had repeats and such between
>SSTables which LCS avoids mostly.  I can't help much more than that info
>though.
>
>Dean
>
>On 3/28/13 12:31 PM, "Ben Chobot" <be...@instructure.com> wrote:
>
>>Sorry to make it confusing. I didn't have snapshots on some nodes; I just
>>made a snapshot on a node with this problem.
>>
>>So to be clear, on this one example node....
>> Cassandra reports ~250GB of space used
>> In a CF data directory (before snapshots existed), du -sh showed ~550GB
>> After the snapshot, du in the same directory still showed ~550GB
>>(they're hard links, so that's correct)
>> du in the snapshot directory for that CF shows ~250GB, and ls shows ~50
>>fewer files.
>>
>>
>>
>>On Mar 28, 2013, at 11:10 AM, Hiller, Dean wrote:
>>
>>> I am confused.  I thought you said you don't have a snapshot.  Df/du
>>> reports space used by existing data AND the snapshot.  Cassandra only
>>> reports on space used by actual data........if you move the snapshots,
>>>does
>>> df/du match what cassandra says?
>>> 
>>> Dean
>>> 
>>> On 3/28/13 12:05 PM, "Ben Chobot" <be...@instructure.com> wrote:
>>> 
>>>> .....though interestingly, the snapshot of these CFs have the "right"
>>>> amount of data in them (i.e. it agrees with the live SSTable size
>>>> reported by cassandra). Is it total insanity to remove the files from
>>>>the
>>>> data directory not included in the snapshot, so long as they were
>>>>created
>>>> before the snapshot?
>>>> 
>>>> On Mar 28, 2013, at 10:54 AM, Hiller, Dean wrote:
>>>> 
>>>>> Have you cleaned up your snapshotsÅ those take extra space and don't
>>>>>just
>>>>> go away unless you delete them.
>>>>> 
>>>>> Dean
>>>>> 
>>>>> On 3/28/13 11:46 AM, "Ben Chobot" <be...@instructure.com> wrote:
>>>>> 
>>>>>> Are you also running 1.1.5? I'm wondering (ok hoping) that this
>>>>>>might
>>>>>> be
>>>>>> fixed if I upgrade.
>>>>>> 
>>>>>> On Mar 28, 2013, at 8:53 AM, Lanny Ripple wrote:
>>>>>> 
>>>>>>> We occasionally (twice now on a 40 node cluster over the last 6-8
>>>>>>> months) see this.  My best guess is that Cassandra can fail to mark
>>>>>>>an
>>>>>>> SSTable for cleanup somehow.  Forced GC's or reboots don't clear
>>>>>>>them
>>>>>>> out.  We disable thrift and gossip; drain; snapshot; shutdown;
>>>>>>>clear
>>>>>>> data/Keyspace/Table/*.db and restore (hard-linking back into place
>>>>>>>to
>>>>>>> avoid data transfer) from the just created snapshot; restart.
>>>>>>> 
>>>>>>> 
>>>>>>> On Mar 28, 2013, at 10:12 AM, Ben Chobot <be...@instructure.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Some of my cassandra nodes in my 1.1.5 cluster show a large
>>>>>>>> discrepancy between what cassandra says the SSTables should sum up
>>>>>>>> to,
>>>>>>>> and what df and du claim exist. During repairs, this is almost
>>>>>>>>always
>>>>>>>> pretty bad, but post-repair compactions tend to bring those
>>>>>>>>numbers
>>>>>>>> to
>>>>>>>> within a few percent of each other... usually. Sometimes they
>>>>>>>>remain
>>>>>>>> much further apart after compactions have finished - for instance,
>>>>>>>> I'm
>>>>>>>> looking at one node now that claims to have 205GB of SSTables, but
>>>>>>>> actually has 450GB of files living in that CF's data directory. No
>>>>>>>> pending compactions, and the most recent compaction for this CF
>>>>>>>> finished just a few hours ago.
>>>>>>>> 
>>>>>>>> nodetool cleanup has no effect.
>>>>>>>> 
>>>>>>>> What could be causing these extra bytes, and how to get them to go
>>>>>>>> away? I'm ok with a few extra GB of unexplained data, but an extra
>>>>>>>> 245GB (more than all the data this node is supposed to have!) is a
>>>>>>>> little extreme.
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>
>


Reply via email to