Re: repair never completes with "finished successfully"

Jonathan Colby Tue, 12 Apr 2011 05:58:07 -0700

There is no "Repair session" message either.   It just starts with a message 
like:


INFO [manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723] 2011-04-10 
14:00:59,051 AntiEntropyService.java (line 770) Waiting for repair requests: 
[#<TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, 
/10.46.108.101, (DFS,main)>, #<TreeRequest 
manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.100, 
(DFS,main)>, #<TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, 
/10.47.108.102, (DFS,main)>, #<TreeRequest 
manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.101, (DFS,main)>]

NETSTATS:

Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0         150846
Responses                       n/a         0         443183

One node in our cluster still has "unreadable rows", where the reads trip up 
every time for certain sstables (you've probably seen my earlier threads 
regarding that).   My suspicion is that the bloom filter read on the node with 
the corrupt sstables is never reporting back to the repair, thereby causing it 
to hang.


What would be great is a scrub tool that ignores unreadable/unserializable 
rows!  : )
 

On Apr 12, 2011, at 2:15 PM, aaron morton wrote:

> Do you see a message starting "Repair session " and ending with "completed 
> successfully" ?
> 
> Or do you see any streaming activity using "nodetool netstats"
> 
> Repair can hang if a neighbour dies and fails to send a requested stream. It 
> will timeout after 24 hours (I think). 
> 
> Aaron
> 
> On 12 Apr 2011, at 23:39, Karl Hiramoto wrote:
> 
>> On 12/04/2011 13:31, Jonathan Colby wrote:
>>> There are a few other threads related to problems with the nodetool repair 
>>> in 0.7.4.  However I'm not seeing any errors, just never getting a message 
>>> that the repair completed successfully.
>>> 
>>> In my production and test cluster (with just a few MB data)  the repair 
>>> nodetool prompt never returns and the last entry in the cassandra.log is 
>>> always something like:
>>> 
>>> #<TreeRequest manual-repair-f739ca7a-bef8-4683-b249-09105f6719d9, 
>>> /10.46.108.102, (DFS,main)>  completed successfully: 1 outstanding
>>> 
>>> But I don't see a message, even hours later, that the 1 outstanding request 
>>> "finished successfully".
>>> 
>>> Anyone else experience this?  These are physical server nodes in local data 
>>> centers and not EC2
>>> 
>> 
>> I've seen this.   To fix it  try a "nodetool compact" then repair.
>> 
>> 
>> --
>> Karl
>

Re: repair never completes with "finished successfully"

Reply via email to