Re: repair takes two days, and ends up stuck: stream at 1096% (yes, really)

Reverend Chip Mon, 15 Nov 2010 13:05:59 -0800

On 11/15/2010 12:09 PM, Jonathan Ellis wrote:
> On Mon, Nov 15, 2010 at 1:03 PM, Reverend Chip <rev.c...@gmail.com> wrote:
>> I find X.21's data disk is full.  "nodetool ring" says that X.21 has a
>> load of only 326.2 GB, but the 1T partition is full.
> Load only tracks live data -- is the rest tmp files.


No, there are a lot of non-tmps that were not included in the load
figure.  Having stopped the server and deleted tmp files, the data are
still using way more space than "ring" claimed -- and too much for
"cleanup" to work, as well:

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/flashcache
                      932G  723G  162G  82% /var/lib/cassandra/data

Given that the previous situation included incomplete replication, I
can't just kill the node and let it repopulate.  So I can either magick
up more disk space or reload the whole cluster.  :-(
Is there anything about the node's data directory that you need to see? 
Or is it reload time?

>> Somehow repair decided it needed to triple the data usage.  I would like
>> to understand this, and I invite recovery suggestions.
> We have https://issues.apache.org/jira/browse/CASSANDRA-1674 open for
> repair space usage.

OK thanks.


>> Meanwhile, I find this interesting sequence on X.20.
>> It could indicate compaction interfered with repair.  Note presence of
>> TestAttrs-e-332 in the compaction:
>>
>>  INFO 10:56:00,283 Compacting
>> [org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/Attrs/TestAttrs-e-332-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/Attrs/TestAttrs-e-351-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/Attrs/TestAttrs-e-368-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/Attrs/TestAttrs-e-386-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/Attrs/TestAttrs-e-401-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/Attrs/TestAttrs-e-414-Data.db')]
>> ...
>>  INFO 01:26:56,175 Deleted /var/lib/cassandra/data/Attrs/TestAttrs-e-332-<>
>> ...
>>  INFO 22:42:10,032 Need to re-stream file
>> /var/lib/cassandra/data/Attrs/TestAttrs-e-332-Data.db to /X.21
>> ERROR 22:42:16,822 Error in ThreadPoolExecutor
>> java.lang.RuntimeException: java.io.IOException: Broken pipe
>> It looks like compaction and repair interfere with each other, and
>> compactions and repairs should stay out of each others' way.
> It looks to me like the sequence of events is:
>  1. streaming had a FD for -332
>  2. compaction deleted the file
>  3. X.21 ran out of space
>  4. the stream errored out
>  5. retrying the stream fails repeatedly

This is plausible, assuming streaming works off an fd and the
disappearance of the file would not disturb it (e.g. it doesn't call
stat() on the filename).

> Notably if one thread has a file open on linux and another
> thread/process deletes it, nothing bad happens and the file is
> unlinked when closed.  (However attempting to do this causes errors on
> Windows, so we try to avoid that.)

Indeed, though to my mind the more serious problem with the deletion is
the inability to restart the stream if the node is rebooted.
 
>> It also looks like streaming doesn't recover gracefully from file deletion.
> If 5. comes before 3. then yes, otherwise I think erroring from out of
> disk space is the most likely culprit.

Yes, it appears that streaming non-recovery was from disk full on the
target, not file removal.

Re: repair takes two days, and ends up stuck: stream at 1096% (yes, really)

Reply via email to