Re: Long running compaction on huge hint table.

varun saluja Sun, 21 May 2017 04:48:08 -0700

Hi All,
  
Can someone Please suggest any recommendations for write intensive jobs


Regards,
Varun Saluja
Sent from my iPhone

> On 17-May-2017, at 3:52 PM, varun saluja <saluj...@gmail.com> wrote:
> 
> Thanks Jeff.
> 
> I have taken backup and did manual removal of hints with rolling restart. 
> This brought cluster back in stable state.
> 
> Can you Please share some recommendation for write intensive job . Actually 
> ,we need to load dump from kafka to 3 node cassandra cluster . Write TPS per 
> node will be around 7k.
> 
> Can you Please suggest any parameter tuning for our use case here. We do not 
> want to get stuck in similar situation of large compactions of hint or any 
> other table where we are loading dump.
> 
> 
> Regards,
> Varun
> 
>> On 17 May 2017 at 09:17, Jeff Jirsa <jji...@gmail.com> wrote:
>> You could also try stopping compaction, but that'll probably take a very 
>> long time as well
>> 
>> Manually stopping each node (one at a time) and removing the sstables from 
>> only system.hints may be a better option. May want to take a snapshot if 
>> you're very concerned with that data.
>> 
>> 
>> 
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On May 16, 2017, at 6:53 PM, varun saluja <saluj...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>>  
>>>  Truncatehints on nodes is running for more than 7 hours now. Nothing 
>>> mentioned for same in sysemt logs even.
>>> 
>>> And compaction stats reports increase in hints total bytes.
>>> 
>>> pending tasks: 1
>>>    compaction type   keyspace   table     completed          total    unit  
>>>  progress
>>>         Compaction     system   hints   12152557998   869257869352   bytes  
>>>     1.40%
>>> Active compaction remaining time :   0h27m14s
>>> 
>>> Can anything else be checked here? Will manually deleting system.hint files 
>>> and restart node fix this.
>>> 
>>> 
>>> 
>>> Regards,
>>> Varun Saluja
>>> 
>>>> On 16 May 2017 at 23:29, varun saluja <saluj...@gmail.com> wrote:
>>>> Hi Jeff,
>>>> 
>>>> I ran nodetool truncatehints  on all nodes. Its running for more than 30 
>>>> mins now. Status for compactstats reports same.
>>>> 
>>>> pending tasks: 1
>>>>    compaction type   keyspace   table     completed          total    unit 
>>>>   progress
>>>>         Compaction     system   hints   11189118129   851658989612   bytes 
>>>>      1.31%
>>>> Active compaction remaining time :   0h26m43s
>>>> 
>>>> Will truncatehints takes time for completion? Could not see anything 
>>>> related truncatehints in system logs.
>>>> 
>>>> Please let me know if anything else can be checked here.
>>>> 
>>>> Regards,
>>>> Varun Saluja 
>>>> 
>>>> 
>>>> 
>>>>> On 16 May 2017 at 20:58, varun saluja <saluj...@gmail.com> wrote:
>>>>> Thanks a lot Jeff.
>>>>> 
>>>>> You have explaned very well here. We have consitency as local quorum. 
>>>>> Will follow truncate hints and repair therafter.
>>>>> 
>>>>> I hope this brings cluster in stable state
>>>>> 
>>>>> Thanks again.
>>>>> 
>>>>> Regards,
>>>>> Varun Saluja
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> > On 16-May-2017, at 8:42 PM, Jeff Jirsa <jji...@apache.org> wrote:
>>>>> >
>>>>> >
>>>>> > In Cassandra versions up to 3.0, hints are stored within a table, where 
>>>>> > the partition key is the host ID of the server for which the hints are 
>>>>> > stored.
>>>>> >
>>>>> > In such a data model, accumulating 800GB of hints is almost certain to 
>>>>> > cause very wide rows, which will in turn cause GC pressure when you 
>>>>> > attempt to read the hints for delivery. This will cause GC pauses, 
>>>>> > which will cause hints to fail to be delivered, which will cause more 
>>>>> > hints to be stored. This is bad.
>>>>> >
>>>>> > In 3.0, hints were rewritten to work around this design flaw. In 2.1, 
>>>>> > your most likely corrective course is to use 'nodetool truncatehints' 
>>>>> > on all servers, followed by 'nodetool repair' to deliver the data you 
>>>>> > lost by truncating the hints.
>>>>> >
>>>>> > NOTE: this is ONLY safe if you wrote with a consistency level stronger 
>>>>> > than CL:ANY. If you wrote this data with CL:ANY, you may lose data if 
>>>>> > you truncate hints.
>>>>> >
>>>>> > - Jeff
>>>>> >
>>>>> >> On 2017-05-16 06:50 (-0700), varun saluja <saluj...@gmail.com> wrote:
>>>>> >> Thanks for update.
>>>>> >> I could see lot of io waits. This causing  Gc and mutation drops .
>>>>> >> But as i mentioned we do not have high load for now. Hint replays are 
>>>>> >> creating such high disk I/O.
>>>>> >> compactionstats show very high hint bytes like 780gb around. Is this 
>>>>> >> normal?
>>>>> >>
>>>>> >> Just mentioning we are using flash disks.
>>>>> >>
>>>>> >> In such case, if i run truncatehints , will it remove or decrease size 
>>>>> >> of hints bytes in compaction stats. I can trigger repair therafter.
>>>>> >> Please let me know if any recommendation on same.
>>>>> >>
>>>>> >> Also , table which we dumped from kafka which created this much hints 
>>>>> >> and compaction pendings is also dropped today. Because we have to 
>>>>> >> redump table again once cluster is stable.
>>>>> >>
>>>>> >> Regards,
>>>>> >> Varun
>>>>> >>
>>>>> >> Sent from my iPhone
>>>>> >>
>>>>> >>> On 16-May-2017, at 6:59 PM, Nitan Kainth <ni...@bamlabs.com> wrote:
>>>>> >>>
>>>>> >>> Yes but it means data has to be replicated using repair.
>>>>> >>>
>>>>> >>> Hints are out come of unhealthy nodes, focus on finding why you have 
>>>>> >>> mutation drops, is it node, io or network etc. ideally you shouldn't 
>>>>> >>> see increasing hints all the time.
>>>>> >>>
>>>>> >>> Sent from my iPhone
>>>>> >>>
>>>>> >>>> On May 16, 2017, at 7:58 AM, varun saluja <saluj...@gmail.com> wrote:
>>>>> >>>>
>>>>> >>>> Hi Nitan,
>>>>> >>>>
>>>>> >>>> Thanks for response.
>>>>> >>>>
>>>>> >>>> Yes, I could see mutation drops and increase count in system.hints. 
>>>>> >>>> Is there any way , i can proceed to truncate hints like using 
>>>>> >>>> nodetool truncatehints.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> Regards,
>>>>> >>>> Varun Saluja
>>>>> >>>>
>>>>> >>>>> On 16 May 2017 at 17:52, Nitan Kainth <ni...@bamlabs.com> wrote:
>>>>> >>>>> Do you see mutation drops?
>>>>> >>>>> Select count from system.hints; is it increasing?
>>>>> >>>>>
>>>>> >>>>> Sent from my iPhone
>>>>> >>>>>
>>>>> >>>>>> On May 16, 2017, at 5:52 AM, varun saluja <saluj...@gmail.com> 
>>>>> >>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Hi Experts,
>>>>> >>>>>>
>>>>> >>>>>> We are facing issue on production cluster. Compaction on 
>>>>> >>>>>> system.hint table is running from last 2 days.
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> pending tasks: 1
>>>>> >>>>>>   compaction type   keyspace   table     completed          total  
>>>>> >>>>>>                     unit   progress
>>>>> >>>>>>              Compaction     system   hints   20623021829   
>>>>> >>>>>> 877874092407   bytes      2.35%
>>>>> >>>>>> Active compaction remaining time :   0h27m15s
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Active compaction remaining time shows in minutes.  But, this is 
>>>>> >>>>>> job is running like indefinitely.
>>>>> >>>>>>
>>>>> >>>>>> We have 3 node cluster V 2.1.7. And we ran  write intensive job 
>>>>> >>>>>> last week on particular table.
>>>>> >>>>>> Compaction on this table finished but hint table size is growing 
>>>>> >>>>>> continuously.
>>>>> >>>>>>
>>>>> >>>>>> Can someone Please help me.
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Thanks & Regards,
>>>>> >>>>>> Varun Saluja
>>>>> >>>>>>
>>>>> >>>>
>>>>> >>
>>>>> >
>>>>> > ---------------------------------------------------------------------
>>>>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>>> > For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>> >
>>>> 
>>> 
>

Re: Long running compaction on huge hint table.

Reply via email to