Re: Problems recovering a dead node

aaron morton Wed, 04 May 2011 02:03:14 -0700

Certainly sounds a bit sick. 

The first error looks like it happens when the index file points to the wrong 
place in the data file for the SSTable. The second one happens when the index 
file is corrupted. The should be problems nodetool scrub can fix.


The disk space may be dead space to cassandra compaction or some other 
streaming failure. You can check how much it considers to be live (in use) 
space using nodetool cfstats. This will also tell you how many sstables are 
live. Having a lot of dead SSTables is not necessarily a bad thing. 

What are the pending tasks ? what is nodetool tpstats showing ? And what does 
nodetool ring show from one of the other nodes ? 

I'm assuming there are no errors in the logs on the node. What are the most 
recent INFO messages?

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 4 May 2011, at 17:54, Héctor Izquierdo Seliva wrote:

> 
> Hi Aaron
> 
> It has no data files whatsoever. The upgrade path is 0.7.4 -> 0.7.5. It
> turns out the initial problem was the sw raid failing silently because
> of another faulty disk.
> 
> Now that the storage is working, I brought up the node again, same IP,
> same token and tried doing nodetool repair. 
> 
> All adjacent nodes have finished the streaming session, and now the node
> has a total of 248 GB of data. Is this normal when the load per node is
> about 18GB? 
> 
> Also there are 1245 pending tasks. It's been compacting or rebuilding
> sstables for the last 8 hours non stop. There are 2057 sstables in the
> data folder.
> 
> Should I have done thing differently or is this the normal behaviour?
> 
> Thanks!
> 
> El mié, 04-05-2011 a las 07:54 +1200, aaron morton escribió:
>> When you say "it's clean" does that mean the node has no data files ?
>> 
>> After you replaced the disk what process did you use to recover  ?
>> 
>> Also what version are you running and what's the recent upgrade history ?
>> 
>> Cheers
>> Aaron
>> 
>> On 3 May 2011, at 23:09, Héctor Izquierdo Seliva wrote:
>> 
>>> Hi everyone. One of the nodes in my 6 node cluster died with disk
>>> failures. I have replaced the disks, and it's clean. It has the same
>>> configuration (same ip, same token).
>>> 
>>> When I try to restart the node it starts to throw mmap underflow
>>> exceptions till it closes again.
>>> 
>>> I tried setting io to standard, but it still fails. It gives errors
>>> about two decorated keys being different, and the EOFException.
>>> 
>>> Here is an excerpt of the log
>>> 
>>> http://pastebin.com/ZXW1wY6T
>>> 
>>> I can provide more info if needed. I'm at a loss here so any help is
>>> appreciated.
>>> 
>>> Thanks all for your time
>>> 
>>> Héctor Izquierdo
>>> 
>> 
> 
>

Re: Problems recovering a dead node

Reply via email to