Even if it is a network error it would be good to detect it. 

If you can run a small repair with those log settings I'll can take a look at 
the logs if you want. Cannot promise anything but another set of eyes may help. 

Ping me off list if you want to send me the logs. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/07/2012, at 4:32 AM, Bill Au wrote:

> I had ran into the same problem before:
> 
> http://comments.gmane.org/gmane.comp.db.cassandra.user/25334
> 
> I have not fond any solutions yet.
> 
> Bill
> 
> On Mon, Jul 16, 2012 at 11:10 AM, Bart Swedrowski <b...@timedout.org> wrote:
> 
> 
> On 16 July 2012 11:25, aaron morton <aa...@thelastpickle.com> wrote:
> In the before time someone had problems with a switch/router that was 
> dropping persistent but idle connections. Doubt this applies, and it would 
> probably result in an error, just throwing it out there.
> 
> Yes, been through them few times.  There's literally no errors or warning at 
> all.  And sometimes, as aforementioned, there's actually INFO that merkle 
> tree has been sent where the other side is not receiving it.
> 
> Just now, I kicked off manual repair on node with IP 192.168.94.178 and just 
> got stuck on streaming files again.
> 
> Node 192.168.94.179:
> 
> Streaming from: /192.168.81.5
>    Medals: /var/lib/cassandra/data/Medals/dataa-hd-1127-Data.db sections=46 
> progress=0/5096 - 0%
>    Medals: /var/lib/cassandra/data/Medals/dataa-hd-1128-Data.db sections=244 
> progress=0/1548510 - 0%
>    Medals: /var/lib/cassandra/data/Medals/dataa-hd-1119-Data.db sections=228 
> progress=0/82859 - 0%
> 
> Node 192.168.81.5:
> 
> Streaming to: /192.168.94.179
>    /var/lib/cassandra/data/Medals/dataa-hd-1129-Data.db sections=2 
> progress=168/168 - 100%
>    /var/lib/cassandra/data/Medals/dataa-hd-1128-Data.db sections=244 
> progress=0/1548510 - 0%
>    /var/lib/cassandra/data/Medals/dataa-hd-1127-Data.db sections=46 
> progress=0/5096 - 0%
>    /var/lib/cassandra/data/Medals/dataa-hd-1119-Data.db sections=228 
> progress=0/82859 - 0%
> 
> Looks like streaming this specific SSTable hasn't finished (or been ACKed on 
> the other side)
> 
>    /var/lib/cassandra/data/Medals/dataa-hd-1129-Data.db sections=2 
> progress=168/168 - 100%
> 
> This morning I've tightend monitoring so now we've each node monitoring each 
> other with ICMP packets (20 every minute) and monitoring is silent; no issues 
> reported since the morning, not a single packet lost.
> 
> I got some help from Acunu guys, first we believed we fixed the problem by 
> disabling bonding on the servers and blamed it for messing up stuff with 
> interrupts however this morning problem resurfaced.
> 
> I can see (and Acunu says) everything is pointing to network related problem 
> (although I'd expect IP stack to correct simple PL) but there's no way to 
> back this up (unless only Cassandra related traffic is getting lost but *how* 
> to monitor for it???).
> 
> Honestly, running out of ideas - further advice highly appreciated.
> 

Reply via email to