On May 14, 2013, at 6:50 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> Let's say we're seing some bug in C*, and SSTables doesn't get deleted >> during compaction (which I guess is the only reason for this consumption of >> diskspace). > > Just out of interest can you check the number of SSTables reported by > nodetool cfstats for a CF against the number of *-Data.db files in the > appropriate directory on disk? > Another test is to take a snapshot and see if there are files in the live > directory not in the snapshot dir. > > Either of these techniques may identify SSTables on disk that the server is > not tracking. > > Cheers Currently we see 9272 Data.db files, but only 8944 is reported by nodetool cfstats. However, C* 1.2.4 seems to correct the problems, as it has recovered most of the used space. Still waiting for the compactions to complete, though. I'll check again once compaction is done. > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 10/05/2013, at 8:33 PM, Nicolai Gylling <n...@issuu.com> wrote: > >>> On Wed, May 8, 2013 at 10:43 PM, Nicolai Gylling <n...@issuu.com> wrote: >>>> At the time of normal operation there was 800 gb free space on each node. >>>> After the crash, C* started using a lot more, resulting in an >>>> out-of-diskspace situation on 2 nodes, eg. C* used up the 800 gb in just 2 >>>> days, giving us very little time to do anything about it, since >>>> repairs/joins takes a considerable amount of time. >>> >>> Did someone do a repair? Repair very frequently results in (usually >>> temporary) >2x disk consumption. >>> >> Repairs is running regularly once a week, and normally doesn't take up much >> space, as we're using Leveled Compaction Strategy. >> >> >>>> What can make C* suddenly use this amount of disk-space? We did see a lot >>>> of >>>> pending compactions on one node (7k). >>> >>> Mostly repair. >>> >>>> Any tips on recovering from an out-of-diskspace on multiple nodes, >>>> situation? I've tried moving some SStables away, but C* seems to use >>>> whatever space I free up in no time. I'm not sure if any of the nodes is >>>> fully updated as 'nodetool status' reports 3 different loads >>> >>> A relevant note here is that moving sstables out of the full partition >>> while cassandra is running will not result in any space recovery, >>> because Cassandra still has an open filehandle to that sstable. In >>> order to deal with out of disk space condition you need to stop >>> Cassandra. Unfortunately the JVM stops responding to clean shutdown >>> request when the disk is full, you will have to kill -KILL the >>> process. >>> >>> If you have a lot of overwrites/fragmentation, you could attempt to >>> clear enough space to do a major compaction of remaining data, do that >>> major compaction, split your One Huge sstable with the (experimental) >>> sstable_split tool and then copy temporarily moved sstables back onto >>> the node. You could also attempt to use user defined compaction (via >>> JMX endpoint) to strategically compact such data. If you grep for >>> compaction in your logs, do you see compactions resulting in smaller >>> output file sizes? (compacted to X% of original messages) >>> >>> I agree with Alexis Rodriguez that Cassandra 1.2.0 is not a version >>> anyone should run, it contains significant bugs. >>> >>> =Rob >> >> We're storing timeseries, so we don't have any overwrites and hardly any >> reduction in sizes during compaction. I'll try to upgrade and see if that >> can help get some diskspace back. >> >> Let's say we're seing some bug in C*, and SSTables doesn't get deleted >> during compaction (which I guess is the only reason for this consumption of >> diskspace). Will C* 1.2.4 be able to fix this? Or would it be a better >> solution to replace one node at a time, so we're sure to only have the data, >> that C* knows about? >> >> >