> There were few Compacted files. I thought that might have been the cause, > but it wasn't it. We have a CF that is 23GB, and while repair is running, > there are multiple instances of that CF created along with other CFs.
To confirm - are you saying the data directory size is huge, but the live size as reported by nodetool ring and nodetool info does NOT reflect this inflated size? What files *do* you have in the data directory? Any left-over *tmp* files for example? Are you sure you're only running a single repair at a time? (Sorry if this was covered, I did a quick swipe through thread history because I was unsure whether I was confusing two different threads, and I don't think so.) The question is what's taking the space. If it's sstables, they really should be either compacted onces that are marked for deletion but being retained, or "live" sstables in which case they should show up as load in nodetool. What else... maybe streams are being re-tried from the source nodes and the disk space is coming from a bunch of half-finished streams of the same data. But if so, those should be *tmp* files IIRC. I'm just wildly speculation, but it would be nice to get to the bottom of this. -- / Peter Schuller (@scode on twitter)