I had ran into the same problem before:


I have not fond any solutions yet.


On Mon, Jul 16, 2012 at 11:10 AM, Bart Swedrowski <b...@timedout.org> wrote:

> On 16 July 2012 11:25, aaron morton <aa...@thelastpickle.com> wrote:
>> In the before time someone had problems with a switch/router that was
>> dropping persistent but idle connections. Doubt this applies, and it would
>> probably result in an error, just throwing it out there.
> Yes, been through them few times.  There's literally no errors or warning
> at all.  And sometimes, as aforementioned, there's actually INFO that
> merkle tree has been sent where the other side is not receiving it.
> Just now, I kicked off manual repair on node with IP and
> just got stuck on streaming files again.
> Node
> Streaming from: /
>>    Medals: /var/lib/cassandra/data/Medals/dataa-hd-1127-Data.db
>> sections=46 progress=0/5096 - 0%
>>    Medals: /var/lib/cassandra/data/Medals/dataa-hd-1128-Data.db
>> sections=244 progress=0/1548510 - 0%
>>    Medals: /var/lib/cassandra/data/Medals/dataa-hd-1119-Data.db
>> sections=228 progress=0/82859 - 0%
> Node
> Streaming to: /
>>    /var/lib/cassandra/data/Medals/dataa-hd-1129-Data.db sections=2
>> progress=168/168 - 100%
>>    /var/lib/cassandra/data/Medals/dataa-hd-1128-Data.db sections=244
>> progress=0/1548510 - 0%
>>    /var/lib/cassandra/data/Medals/dataa-hd-1127-Data.db sections=46
>> progress=0/5096 - 0%
>>    /var/lib/cassandra/data/Medals/dataa-hd-1119-Data.db sections=228
>> progress=0/82859 - 0%
> Looks like streaming this specific SSTable hasn't finished (or been ACKed
> on the other side)
>    /var/lib/cassandra/data/Medals/dataa-hd-1129-Data.db sections=2
>> progress=168/168 - 100%
> This morning I've tightend monitoring so now we've each node monitoring
> each other with ICMP packets (20 every minute) and monitoring is silent; no
> issues reported since the morning, not a single packet lost.
> I got some help from Acunu guys, first we believed we fixed the problem by
> disabling bonding on the servers and blamed it for messing up stuff with
> interrupts however this morning problem resurfaced.
> I can see (and Acunu says) everything is pointing to network related
> problem (although I'd expect IP stack to correct simple PL) but there's no
> way to back this up (unless only Cassandra related traffic is getting lost
> but *how* to monitor for it???).
> Honestly, running out of ideas - further advice highly appreciated.

Reply via email to