Re: Decommissioning node is causing broken pipe error

aaron morton Wed, 04 May 2011 01:43:19 -0700

It's no longer recommended to run nodetool compact regularly as it can mean 
that some tombstones do not get to be purged for a very long time. Minor 
compaction is all you need to keep things in check, however 648 seems like a 
lot of SSTables. Were some of these compacted files ? How many SSTables are 
reported via JConsole for the CF's ?


For the error the receiving side logs the error at DEBUG level when the 
connection fails. Not sure how useful it will be but if you set logging to 
DEBUG for the org.apache.cassandra.net.IncomingStreamReader logger it may help 
identify what the receiving end saw when the socket closed.  

It would also be interesting to know if it fails at the exact some place every 
time. the progress value in the INFO log messages on the receiving side say how 
many bytes have been received. 

Are the nodes in the same AZ ? same Region ? Anything interesting with the 
networking ?

Check nodetool ring to see if the node you are trying to decommission still 
owns it's token. This will also tell you if the other nodes still see it as 
leaving. 

I you cannot get the node to decomission you could try shutting it down and 
using nodetool removetoken from another node 
http://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely

One other thing, check the data directory on the receiving side. It may still 
have some partially written tmp files from the failed streaming. 
 
Hope that helps. 



On 4 May 2011, at 12:29, tamara.alexan...@accenture.com wrote:

> Hi all,
>  
> I ran decommission on a node in my 32 node cluster. After about an hour of 
> streaming files to another node, I got this error on the node being 
> decommissioned:
> INFO [MiscStage:1] 2011-05-03 21:49:00,235 StreamReplyVerbHandler.java (line 
> 58) Need to re-stream file /raiddrive/MDR/MeterRecords-f-2283-Data.db to 
> /10.206.63.208
> ERROR [Streaming:1] 2011-05-03 21:49:01,580 DebuggableThreadPoolExecutor.java 
> (line 103) Error in ThreadPoolExecutor
> java.lang.RuntimeException: java.io.IOException: Broken pipe
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Broken pipe
>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>         at 
> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
>         at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
>         at 
> org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:105)
>         at 
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:67)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
> ERROR [Streaming:1] 2011-05-03 21:49:01,581 AbstractCassandraDaemon.java 
> (line 112) Fatal exception in thread Thread[Streaming:1,1,main]
> java.lang.RuntimeException: java.io.IOException: Broken pipe
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Broken pipe
>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>         at 
> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
>         at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
>         at 
> org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:105)
>         at 
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:67)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
>  
> And this message on the node that it was streaming to:
> INFO [Thread-333] 2011-05-03 21:49:00,234 StreamInSession.java (line 121) 
> Streaming of file 
> /raiddrive/MDR/MeterRecords-f-2283-Data.db/(98605680685,197932763967)
>          progress=49016107008/99327083282 - 49% from 
> org.apache.cassandra.streaming.StreamInSession@33721219 failed: requesting a 
> retry.
>  
> I tried running decommission again (and running scrub + decommission), but I 
> keep getting this error on the same file.
>  
> I checked out the file and saw that it is a lot bigger than all the other 
> sstables… 184GB instead of about 74MB. I haven’t run a major compaction for a 
> bit, so I’m trying to stream 658 sstables.
>  
> I’m using Cassandra 0.7.4, I have two data directories (I know that’s not 
> good practice…), and all my nodes are on Amazon EC2.
>  
> Any thoughts on what could be going on or how to prevent this?
>  
> Thanks!
> Tamara
>  
>  
> 
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise private information. If you have received it in 
> error, please notify the sender immediately and delete the original. Any 
> other use of the email by you is prohibited.

Re: Decommissioning node is causing broken pipe error

Reply via email to