Re: Drastic increase in disk usage after starting repair on 3.7

2017-09-21 Thread Paul Pollack
So I got to the bottom of this -- turns out it's not an issue with Cassandra at all. Seems that whenever these instances were set up we had originally mounted 2TB drives from /dev/xvdc and those were persisted to /etc/fstab, but at some point someone unmounted those and replaced them with 4TB drive

Re: Drastic increase in disk usage after starting repair on 3.7

2017-09-21 Thread Paul Pollack
Thanks for the suggestions guys. Nicolas, I just checked nodetool listsnapshots and it doesn't seem like those are causing the increase: Snapshot Details: Snapshot nameKeyspace name Column family name True size Size on disk 1479343904106-statistic_segment_timel

Re: Drastic increase in disk usage after starting repair on 3.7

2017-09-21 Thread Nicolas Guyomar
Hi Paul, This might be a long shot, but some repairs might fail to clear their snapshot (not sure if its still the case with C* 3.7 however, I had the problem on 2.X branche). What does nodetool listsnapshot indicate ? On 21 September 2017 at 05:49, kurt greaves wrote: > repair does overstream

Re: Drastic increase in disk usage after starting repair on 3.7

2017-09-20 Thread kurt greaves
repair does overstream by design, so if that node is inconsistent you'd expect a bit of an increase. if you've got a backlog of compactions that's probably due to repair and likely the cause of the increase. if you're really worried you can rolling restart to stop the repair, otherwise maybe try in

Re: Drastic increase in disk usage after starting repair on 3.7

2017-09-20 Thread Paul Pollack
Just a quick additional note -- we have checked and this is the only node in the cluster exhibiting this behavior, disk usage is steady on all the others. CPU load on the repairing node is slightly higher but nothing significant. On Wed, Sep 20, 2017 at 9:08 PM, Paul Pollack wrote: > Hi, > > I'm

Drastic increase in disk usage after starting repair on 3.7

2017-09-20 Thread Paul Pollack
Hi, I'm running a repair on a node in my 3.7 cluster and today got alerted on disk space usage. We keep the data and commit log directories on separate EBS volumes. The data volume is 2TB. The node went down due to EBS failure on the commit log drive. I stopped the instance and was later told by A