Re: node down = log explosion?

Sergey Olefir Tue, 22 Jan 2013 23:36:29 -0800

Thanks!

Node writing to log because it cannot handle load is much different than
node writing to log "just because". Although the amount of logging is still
excessive and would it really hurt anything to add something like "can't
handle load" to the exception message?


On the subject of RF:3 -- could you please elaborate?
- Why RF:3 is important? (vs e.g. 2)
- My total replication factor is 4 over two DCs -- I suppose you mean 3
replicas in each DC?
- Does that mean I'll have to run at least 4 nodes in each DC? (3 for RF:3
and one additional in case one fails)

(and again -- thanks Aaron! You've been helping me A LOT on this list.)
Best regards,
Sergey


aaron morton wrote
>> Replication is configured as DC1:2,DC2:2 (i.e. every node holds the
>> entire
>> data).
> I really recommend using RF 3. 
> 
> 
> The error is the coordinator node protecting it's self. 
> 
> Basically it cannot handle the volume of local writes + the writes for HH. 
> The number of in flight hints is greater than…
> 
>     private static volatile int maxHintsInProgress = 1024 *
> Runtime.getRuntime().availableProcessors();
> 
> You may be able to work around this by reducing the max_hint_window_in_ms
> in the yaml file so that hints are recorded if say the node has been down
> for more than 1 minute. 
> 
> Anyways I would say your test showed that the current cluster does not
> have sufficient capacity to handle the write load with one node down and
> HH enabled at the current level. You can either add more nodes, use nodes
> with more cores, adjust the HH settings, or reduce the throughput. 
> 
> 
>>> On the subject of bug report -- I probably will -- but I'll wait a bit
>>> for
> 
> perhaps the excessive logging could be handled better, please add a ticket
> when you have time. 
> 
> Cheers
>  
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 23/01/2013, at 2:12 PM, Rob Coli &lt;

> rcoli@

> &gt; wrote:
> 
>> On Tue, Jan 22, 2013 at 2:57 PM, Sergey Olefir &lt;

> solf.lists@

> &gt; wrote:
>>> Do you have a suggestion as to what could be a better fit for counters?
>>> Something that can also replicate across DCs and survive link breakdown
>>> between nodes (across DCs)? (and no, I don't need 100.00% precision
>>> (although it would be nice obviously), I just need to be "pretty close"
>>> for
>>> the values of "pretty")
>> 
>> In that case, Cassandra counters are probably fine.
>> 
>>> On the subject of bug report -- I probably will -- but I'll wait a bit
>>> for
>>> more info here, perhaps there's some configuration or something that I
>>> just
>>> don't know about.
>> 
>> Excepting on replicateOnWrite stage seems pretty unambiguous to me,
>> and unexpected. YMMV?
>> 
>> =Rob
>> 
>> -- 
>> =Robert Coli
>> AIM&GTALK - 

> rcoli@

>> YAHOO - rcoli.palominob
>> SKYPE - rcoli_palominodb





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/node-down-log-explosion-tp7584932p7584960.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: node down = log explosion?

Reply via email to