Re: Bootstrapping a node fails because of compactions not keeping up

Jeff Jirsa Sun, 15 Oct 2017 14:14:19 -0700

(Should still be able to complete, unless you’re running out of disk or memory 
or similar, but that’s why it’s streaming more than you expect)



-- 
Jeff Jirsa


> On Oct 15, 2017, at 1:51 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> I
> You’re adding the new node as rac3
> 
> The rack aware policy is going to make sure you get the rack diversity you 
> asked for by making sure one replica of each partition is in rac3, which is 
> going to blow up that instance
> 
> 
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Oct 15, 2017, at 1:42 PM, Stefano Ortolani <ostef...@gmail.com> wrote:
>> 
>> Hi Jeff,
>> 
>> this my third attempt bootstrapping the node so I tried several tricks that 
>> might partially explain the output I am posting.
>> 
>> * To make the bootstrap incremental, I have been throttling the streams on 
>> all nodes to 1Mbits. I have selectively unthrottling one node at a time 
>> hoping that would unlock some routines compacting away redundant data 
>> (you'll see that nodetool netstats reports back fewer nodes than nodetool 
>> status).
>> * Since compactions have had the tendency of getting stuck (hundreds pending 
>> but none executing) in previous bootstraps, I've tried issuing a manual 
>> "nodetool compact" on the boostrapping node.
>> 
>> Having said that, this is the output of the commands,
>> 
>> Thanks a lot,
>> Stefano
>> 
>> nodetool status
>> Datacenter: DC1
>> ===============
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address      Load       Tokens       Owns    Host ID                     
>>           Rack
>> UN  X.Y.33.8   342.4 GB   256          ?       
>> afaae414-30cc-439d-9785-1b7d35f74529  RAC1
>> UN  X.Y.81.4   325.98 GB  256          ?       
>> 00a96a5d-3bfd-497f-91f3-973b75146162  RAC2
>> UN  X.Y.33.4   348.81 GB  256          ?       
>> 1d8e6588-e25b-456a-8f29-0dedc35bda8e  RAC1
>> UN  X.Y.33.5   384.99 GB  256          ?       
>> 13d03fd2-7528-466b-b4b5-1b46508e2465  RAC1
>> UN  X.Y.81.5   336.27 GB  256          ?       
>> aa161400-6c0e-4bde-bcb3-b2e7e7840196  RAC2
>> UN  X.Y.33.6   377.22 GB  256          ?       
>> 43a393ba-6805-4e33-866f-124360174b28  RAC1
>> UN  X.Y.81.6   329.61 GB  256          ?       
>> 4c3c64ae-ef4f-4986-9341-573830416997  RAC2
>> UN  X.Y.33.7   344.25 GB  256          ?       
>> 03d81879-dc0d-4118-92e3-b3013dfde480  RAC1
>> UN  X.Y.81.7   324.93 GB  256          ?       
>> 24bbf4b6-9427-4ed1-a751-a55cc24cc756  RAC2
>> UN  X.Y.81.1   323.8 GB   256          ?       
>> 26244100-0565-4567-ae9c-0fc5346f5558  RAC2
>> UJ  X.Y.177.2  724.5 GB   256          ?       
>> e269a06b-c0c0-43a6-922c-f04c98898e0d  RAC3
>> UN  X.Y.81.2   337.83 GB  256          ?       
>> 09e29429-15ff-44d6-9742-ac95c83c4d9e  RAC2
>> UN  X.Y.81.3   326.4 GB   256          ?       
>> feaa7b27-7ab8-4fc2-b64a-c9df3dd86d97  RAC2
>> UN  X.Y.33.3   350.4 GB   256          ?       
>> cc115991-b7e7-4d06-87b5-8ad5efd45da5  RAC1
>> 
>> 
>> nodetool netstats -H | grep "Already received" -B 1
>>     /X.Y.81.4
>>         Receiving 1992 files, 103.68 GB total. Already received 515 files, 
>> 23.32 GB total
>> --
>>     /X.Y.81.7
>>         Receiving 1936 files, 89.35 GB total. Already received 554 files, 
>> 23.32 GB total
>> --
>>     /X.Y.81.5
>>         Receiving 1926 files, 95.69 GB total. Already received 545 files, 
>> 23.31 GB total
>> --
>>     /X.Y.81.2
>>         Receiving 1992 files, 100.81 GB total. Already received 537 files, 
>> 23.32 GB total
>> --
>>     /X.Y.81.3
>>         Receiving 1958 files, 104.72 GB total. Already received 503 files, 
>> 23.31 GB total
>> --
>>     /X.Y.81.1
>>         Receiving 2034 files, 104.51 GB total. Already received 520 files, 
>> 23.33 GB total
>> --
>>     /X.Y.81.6
>>         Receiving 1962 files, 96.19 GB total. Already received 547 files, 
>> 23.32 GB total
>> --
>>     /X.Y.33.5
>>         Receiving 2121 files, 97.44 GB total. Already received 601 files, 
>> 23.32 GB total
>> 
>> nodetool tpstats
>> Pool Name                    Active   Pending      Completed   Blocked  All 
>> time blocked
>> MutationStage                     0         0      828367015         0       
>>           0
>> ViewMutationStage                 0         0              0         0       
>>           0
>> ReadStage                         0         0              0         0       
>>           0
>> RequestResponseStage              0         0             13         0       
>>           0
>> ReadRepairStage                   0         0              0         0       
>>           0
>> CounterMutationStage              0         0              0         0       
>>           0
>> MiscStage                         0         0              0         0       
>>           0
>> CompactionExecutor                1         1          12150         0       
>>           0
>> MemtableReclaimMemory             0         0           7368         0       
>>           0
>> PendingRangeCalculator            0         0             14         0       
>>           0
>> GossipStage                       0         0         599329         0       
>>           0
>> SecondaryIndexManagement          0         0              0         0       
>>           0
>> HintsDispatcher                   0         0              0         0       
>>           0
>> MigrationStage                    0         0             27         0       
>>           0
>> MemtablePostFlush                 0         0           8112         0       
>>           0
>> ValidationExecutor                0         0              0         0       
>>           0
>> Sampler                           0         0              0         0       
>>           0
>> MemtableFlushWriter               0         0           7368         0       
>>           0
>> InternalResponseStage             0         0             25         0       
>>           0
>> AntiEntropyStage                  0         0              0         0       
>>           0
>> CacheCleanupExecutor              0         0              0         0       
>>           0
>> 
>> Message type           Dropped
>> READ                         0
>> RANGE_SLICE                  0
>> _TRACE                       0
>> HINT                         0
>> MUTATION                     1
>> COUNTER_MUTATION             0
>> BATCH_STORE                  0
>> BATCH_REMOVE                 0
>> REQUEST_RESPONSE             0
>> PAGED_RANGE                  0
>> READ_REPAIR                  0
>> 
>> nodetool compactionstats -H
>> pending tasks: 776
>>                                      id   compaction type         keyspace   
>>                 table   completed     total    unit   progress
>>    24d039f2-b1e6-11e7-ac57-3d25e38b2f5c        Compaction   keyspace_1   
>> table_1     4.85 GB   7.67 GB   bytes     63.25%
>> Active compaction remaining time :        n/a
>> 
>> 
>>> On Sun, Oct 15, 2017 at 9:27 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>> Can you post (anonymize as needed) nodetool status, nodetool netstats, 
>>> nodetool tpstats, and nodetool compctionstats ?
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>>> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <ostef...@gmail.com> wrote:
>>>> 
>>>> Hi Jeff,
>>>> 
>>>> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256).
>>>> 
>>>> Stefano
>>>> 
>>>>> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>>> What version?
>>>>> 
>>>>> Single disk or JBOD?
>>>>> 
>>>>> Vnodes?
>>>>> 
>>>>> -- 
>>>>> Jeff Jirsa
>>>>> 
>>>>> 
>>>>>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <ostef...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so 
>>>>>> far. 
>>>>>> Based on the source code it seems that this option doesn't affect 
>>>>>> compactions while bootstrapping.
>>>>>> 
>>>>>> I am getting quite confused as it seems I am not able to bootstrap a 
>>>>>> node if I don't have at least 6/7 times the disk space used by other 
>>>>>> nodes.
>>>>>> This is weird. The host I am bootstrapping is using a SSD. Also 
>>>>>> compaction throughput is unthrottled (set to 0) and the compacting 
>>>>>> threads are set to 8.
>>>>>> Nevertheless, primary ranges from other nodes are being streamed, but 
>>>>>> data is never compacted away.
>>>>>> 
>>>>>> Does anybody know anything else I could try?
>>>>>> 
>>>>>> Cheers,
>>>>>> Stefano
>>>>>> 
>>>>>>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <ostef...@gmail.com> 
>>>>>>> wrote:
>>>>>>> Other little update: at the same time I see the number of pending tasks 
>>>>>>> stuck (in this case at 1847); restarting the node doesn't help, so I 
>>>>>>> can't really force the node to "digest" all those compactions. In the 
>>>>>>> meanwhile the disk occupied is already twice the average load I have on 
>>>>>>> other nodes.
>>>>>>> 
>>>>>>> Feeling more and more puzzled here :S
>>>>>>> 
>>>>>>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <ostef...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>> I have been trying to add another node to the cluster (after upgrading 
>>>>>>>> to 3.0.15) and I just noticed through "nodetool netstats" that all 
>>>>>>>> nodes have been streaming to the joining node approx 1/3 of their 
>>>>>>>> SSTables, basically their whole primary range (using RF=3)?
>>>>>>>> 
>>>>>>>> Is this expected/normal? 
>>>>>>>> I was under the impression only the necessary SSTables were going to 
>>>>>>>> be streamed...
>>>>>>>> 
>>>>>>>> Thanks for the help,
>>>>>>>> Stefano
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <k...@instaclustr.com> 
>>>>>>>> wrote:
>>>>>>>>>> But if it also streams, it means I'd still be under-pressure if I am 
>>>>>>>>>> not mistaken. I am under the assumption that the compactions are the 
>>>>>>>>>> by-product of streaming too many SStables at the same time, and not 
>>>>>>>>>> because of my current write load.
>>>>>>>>> 
>>>>>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the 
>>>>>>>>> performance impact from the node being backed up with compactions. If 
>>>>>>>>> you haven't already, you should try disable stcs in l0 on the joining 
>>>>>>>>> node. You will likely still need to do a lot of compactions, but 
>>>>>>>>> generally they should be smaller. The  option is 
>>>>>>>>> -Dcassandra.disable_stcs_in_l0=true
>>>>>>>>>>  I just noticed you were mentioning L1 tables too. Why would that 
>>>>>>>>>> affect the disk footprint?
>>>>>>>>> 
>>>>>>>>> If you've been doing a lot of STCS in L0, you generally end up with 
>>>>>>>>> some large SSTables. These will eventually have to be compacted with 
>>>>>>>>> L1. Could also be suffering the problem of streamed SSTables causing 
>>>>>>>>> large cross-level compactions in the higher levels as well.
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>

Re: Bootstrapping a node fails because of compactions not keeping up

Reply via email to