Re: Long running compaction on huge hint table.

Ben Slater Sun, 21 May 2017 05:11:34 -0700

My main suggestion would be to monitor the compaction backlog (pending
compactions). If the backlog is growing you need to either throttle writes,
add more capacity to your cluster or possibly tune things. There is no
simple answer to tuning but several good guides on the internet to help -
this is my favourite:
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html.


Unless there is something really badly set up with your cluster then I
would guess that if it got in this state trying to handle your write load
then you’ll potentially need additional capacity as well as tuning to meet
your needs.

Cheers
Ben

On Sun, 21 May 2017 at 21:47 varun saluja <[email protected]> wrote:

> Hi All,
>
> Can someone Please suggest any recommendations for write intensive jobs
>
> Regards,
> Varun Saluja
> Sent from my iPhone
>
> On 17-May-2017, at 3:52 PM, varun saluja <[email protected]> wrote:
>
> Thanks Jeff.
>
> I have taken backup and did manual removal of hints with rolling restart.
> This brought cluster back in stable state.
>
> Can you Please share some recommendation for write intensive job .
> Actually ,we need to load dump from kafka to 3 node cassandra cluster .
> Write TPS per node will be around 7k.
>
> Can you Please suggest any parameter tuning for our use case here. We do
> not want to get stuck in similar situation of large compactions of hint or
> any other table where we are loading dump.
>
>
> Regards,
> Varun
>
> On 17 May 2017 at 09:17, Jeff Jirsa <[email protected]> wrote:
>
>> You could also try stopping compaction, but that'll probably take a very
>> long time as well
>>
>> Manually stopping each node (one at a time) and removing the sstables
>> from only system.hints may be a better option. May want to take a snapshot
>> if you're very concerned with that data.
>>
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On May 16, 2017, at 6:53 PM, varun saluja <[email protected]> wrote:
>>
>> Hi,
>>
>>
>>  Truncatehints on nodes is running for more than 7 hours now. Nothing
>> mentioned for same in sysemt logs even.
>>
>> And compaction stats reports increase in hints total bytes.
>>
>> pending tasks: 1
>>    compaction type   keyspace   table     completed          total
>>  unit   progress
>>         Compaction     system   hints   12152557998 <(215)%20255-7998>
>> 869257869352   bytes      1.40%
>> Active compaction remaining time :   0h27m14s
>>
>> Can anything else be checked here? Will manually deleting system.hint
>> files and restart node fix this.
>>
>>
>>
>> Regards,
>> Varun Saluja
>>
>> On 16 May 2017 at 23:29, varun saluja <[email protected]> wrote:
>>
>>> Hi Jeff,
>>>
>>> I ran nodetool truncatehints  on all nodes. Its running for more than
>>> 30 mins now. Status for compactstats reports same.
>>>
>>> pending tasks: 1
>>>    compaction type   keyspace   table     completed          total
>>>  unit   progress
>>>         Compaction     system   hints   11189118129   851658989612
>>> bytes      1.31%
>>> Active compaction remaining time :   0h26m43s
>>>
>>> Will truncatehints takes time for completion? Could not see anything
>>> related truncatehints in system logs.
>>>
>>> Please let me know if anything else can be checked here.
>>>
>>> Regards,
>>> Varun Saluja
>>>
>>>
>>>
>>> On 16 May 2017 at 20:58, varun saluja <[email protected]> wrote:
>>>
>>>> Thanks a lot Jeff.
>>>>
>>>> You have explaned very well here. We have consitency as local quorum.
>>>> Will follow truncate hints and repair therafter.
>>>>
>>>> I hope this brings cluster in stable state
>>>>
>>>> Thanks again.
>>>>
>>>> Regards,
>>>> Varun Saluja
>>>>
>>>> Sent from my iPhone
>>>>
>>>> > On 16-May-2017, at 8:42 PM, Jeff Jirsa <[email protected]> wrote:
>>>> >
>>>> >
>>>> > In Cassandra versions up to 3.0, hints are stored within a table,
>>>> where the partition key is the host ID of the server for which the hints
>>>> are stored.
>>>> >
>>>> > In such a data model, accumulating 800GB of hints is almost certain
>>>> to cause very wide rows, which will in turn cause GC pressure when you
>>>> attempt to read the hints for delivery. This will cause GC pauses, which
>>>> will cause hints to fail to be delivered, which will cause more hints to be
>>>> stored. This is bad.
>>>> >
>>>> > In 3.0, hints were rewritten to work around this design flaw. In 2.1,
>>>> your most likely corrective course is to use 'nodetool truncatehints' on
>>>> all servers, followed by 'nodetool repair' to deliver the data you lost by
>>>> truncating the hints.
>>>> >
>>>> > NOTE: this is ONLY safe if you wrote with a consistency level
>>>> stronger than CL:ANY. If you wrote this data with CL:ANY, you may lose data
>>>> if you truncate hints.
>>>> >
>>>> > - Jeff
>>>> >
>>>> >> On 2017-05-16 06:50 (-0700), varun saluja <[email protected]>
>>>> wrote:
>>>> >> Thanks for update.
>>>> >> I could see lot of io waits. This causing  Gc and mutation drops .
>>>> >> But as i mentioned we do not have high load for now. Hint replays
>>>> are creating such high disk I/O.
>>>> >> compactionstats show very high hint bytes like 780gb around. Is this
>>>> normal?
>>>> >>
>>>> >> Just mentioning we are using flash disks.
>>>> >>
>>>> >> In such case, if i run truncatehints , will it remove or decrease
>>>> size of hints bytes in compaction stats. I can trigger repair therafter.
>>>> >> Please let me know if any recommendation on same.
>>>> >>
>>>> >> Also , table which we dumped from kafka which created this much
>>>> hints and compaction pendings is also dropped today. Because we have to
>>>> redump table again once cluster is stable.
>>>> >>
>>>> >> Regards,
>>>> >> Varun
>>>> >>
>>>> >> Sent from my iPhone
>>>> >>
>>>> >>> On 16-May-2017, at 6:59 PM, Nitan Kainth <[email protected]> wrote:
>>>> >>>
>>>> >>> Yes but it means data has to be replicated using repair.
>>>> >>>
>>>> >>> Hints are out come of unhealthy nodes, focus on finding why you
>>>> have mutation drops, is it node, io or network etc. ideally you shouldn't
>>>> see increasing hints all the time.
>>>> >>>
>>>> >>> Sent from my iPhone
>>>> >>>
>>>> >>>> On May 16, 2017, at 7:58 AM, varun saluja <[email protected]>
>>>> wrote:
>>>> >>>>
>>>> >>>> Hi Nitan,
>>>> >>>>
>>>> >>>> Thanks for response.
>>>> >>>>
>>>> >>>> Yes, I could see mutation drops and increase count in
>>>> system.hints. Is there any way , i can proceed to truncate hints like using
>>>> nodetool truncatehints.
>>>> >>>>
>>>> >>>>
>>>> >>>> Regards,
>>>> >>>> Varun Saluja
>>>> >>>>
>>>> >>>>> On 16 May 2017 at 17:52, Nitan Kainth <[email protected]> wrote:
>>>> >>>>> Do you see mutation drops?
>>>> >>>>> Select count from system.hints; is it increasing?
>>>> >>>>>
>>>> >>>>> Sent from my iPhone
>>>> >>>>>
>>>> >>>>>> On May 16, 2017, at 5:52 AM, varun saluja <[email protected]>
>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> Hi Experts,
>>>> >>>>>>
>>>> >>>>>> We are facing issue on production cluster. Compaction on
>>>> system.hint table is running from last 2 days.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> pending tasks: 1
>>>> >>>>>>   compaction type   keyspace   table     completed
>>>> total                      unit   progress
>>>> >>>>>>              Compaction     system   hints   20623021829
>>>>  877874092407   bytes      2.35%
>>>> >>>>>> Active compaction remaining time :   0h27m15s
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Active compaction remaining time shows in minutes.  But, this is
>>>> job is running like indefinitely.
>>>> >>>>>>
>>>> >>>>>> We have 3 node cluster V 2.1.7. And we ran  write intensive job
>>>> last week on particular table.
>>>> >>>>>> Compaction on this table finished but hint table size is growing
>>>> continuously.
>>>> >>>>>>
>>>> >>>>>> Can someone Please help me.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Thanks & Regards,
>>>> >>>>>> Varun Saluja
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe, e-mail: [email protected]
>>>> > For additional commands, e-mail: [email protected]
>>>> >
>>>>
>>>
>>>
>>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Re: Long running compaction on huge hint table.

Reply via email to