Tom,

you should look at phi_convict_threshold and try and increase the value if
you have too much chatter on your network.

Also, rebuilding the entire node because of a OOM does not make sense,
could you please post the C* version that you are using & the head size you
have configured?

Thanks
Rahul


On Tue, Dec 3, 2013 at 7:41 PM, Tom van den Berge <t...@drillster.com> wrote:

> Rahul,
>
> This problem occurs every now and then, and currently everything is ok, so
> there are no hints. But whenever it happens, the hints are quickly piling
> up. This results in heap problems on the node ("Heap is 0.813462 full..."
> appears many times). This in turn results in the flushing of the 'hints'
> column family, to relieve memory pressure. According to the log message,
> the size varies between 50 and 60MB). But since the HintedHandoffManager is
> reading from the hints CF, it will probably pull it back into a memtable
> again -- that's at least my understanding of how it works.
>
> So I guess that flushing the hints CF while the HintedHandoffManager is
> working on it only makes things worse, and it could be the reason that the
> process never ends.
>
> What I typically see when this happens is that the hints keep piling up,
> and eventually the node comes to a grinding halt (OOM). Then I have to
> rebuild the node entirely (only removing the hints doesn't work).
>
> The reason for hints to start accumulating in the first place might be a
> spike in CF writes that must be replicated to a node in another data
> center. The available bandwidth to that data center might not be able to
> handle the data quickly enough, resulting in stored hints. The
> HintedHandoff task that is started is targeting that remote node.
>
>
> Thanks,
> Tom
>
>
> On Tue, Dec 3, 2013 at 2:22 PM, Rahul Menon <ra...@apigee.com> wrote:
>
>> Tom,
>>
>> Do you know why these hints are piling up? What is the size of the hints
>> cf?
>>
>> Thanks
>> Rahul
>>
>>
>> On Tue, Dec 3, 2013 at 6:41 PM, Tom van den Berge <t...@drillster.com>wrote:
>>
>>> Hi Rahul,
>>>
>>> Thanks for your reply.
>>>
>>> I have never seen message like "Timed out replaying hints to...", which
>>> is a good thing then, I suppose ;)
>>>
>>> Normally, I do see the "Finished hinted handoff..." log message.
>>> However, every now and then this message is not logged, not even after
>>> several hours. This is the problem I'm trying to solve.
>>>
>>> The log messages you describe are quite course-grained; they only tell
>>> you that a task has started or finished, but not how this task is
>>> progressing. And that's exactly what I would like to know if I see that a
>>> task has started, but has not finished after a reasonable amount of time.
>>>
>>> So I guess the only way to see learn the progress is to look inside the
>>> 'hints' column family then.I'll give that a try.
>>>
>>>
>>> Thanks,
>>> Tom
>>>
>>>
>>> On Tue, Dec 3, 2013 at 1:43 PM, Rahul Menon <ra...@apigee.com> wrote:
>>>
>>>> Tom,
>>>>
>>>> You should check the size of the hints column family to determine how
>>>> much are present. The hints are a super column family and its keys are
>>>> destination tokens. You could look at it if you would like.
>>>>
>>>> Hints send and timedouts are logged, you should be seeing something
>>>> like
>>>>
>>>> Timed out replaying hints to {}; aborting ({} delivered
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> OR
>>>>
>>>> Finished hinted handoff of {} rows to endpoint {}
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Rahul
>>>>
>>>>
>>>> On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge 
>>>> <t...@drillster.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is there a way to monitor the progress of a hinted handoff task?
>>>>>
>>>>> I found the following two mbeans providing some info:
>>>>>
>>>>> org.apache.cassandra.internal:type=HintedHandoff, which tells me that
>>>>> there is 1 active task, and
>>>>> org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(),
>>>>> which quite often gives a timeout when executed.
>>>>>
>>>>> Ideally, I would like to see how many hints have been sent (e.g. over
>>>>> the last minute or so), and how many hints are still to be sent (although 
>>>>> I
>>>>> assume that's what countPendingHints normally does?)
>>>>>
>>>>> I'm experiencing hinted handoff tasks that are started, but never
>>>>> finish, so I would like to know what the task is doing.
>>>>>
>>>>> My log shows this:
>>>>>
>>>>> INFO [HintedHandoff:1] 2013-12-02
>>>>> 13:49:05,325 HintedHandOffManager.java (line 297) Started hinted handoff
>>>>> for host: 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66
>>>>> (nothing more for [HintedHandoff:1])
>>>>>
>>>>> The node is up and running, the network connection is ok, no gossip
>>>>> messages appear in the logs.
>>>>>
>>>>> Any idea is welcome.
>>>>> (Casandra 1.2.3)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Drillster BV
>>>>> Middenburcht 136
>>>>> 3452MT Vleuten
>>>>> Netherlands
>>>>>
>>>>> +31 30 755 5330
>>>>>
>>>>> Open your free account at www.drillster.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Drillster BV
>>> Middenburcht 136
>>> 3452MT Vleuten
>>> Netherlands
>>>
>>> +31 30 755 5330
>>>
>>> Open your free account at www.drillster.com
>>>
>>
>>
>
>
> --
>
> Drillster BV
> Middenburcht 136
> 3452MT Vleuten
> Netherlands
>
> +31 30 755 5330
>
> Open your free account at www.drillster.com
>

Reply via email to