Re: Bootstrapping a node fails because of compactions not keeping up

Stefano Ortolani Sun, 15 Oct 2017 14:35:55 -0700

Nice catch!
I’ve totally overlooked it.

Thanks a lot!
Stefano


On Sun, 15 Oct 2017 at 22:14, Jeff Jirsa <jji...@gmail.com> wrote:

> (Should still be able to complete, unless you’re running out of disk or
> memory or similar, but that’s why it’s streaming more than you expect)
>
>
> --
> Jeff Jirsa
>
>
> On Oct 15, 2017, at 1:51 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
> I
> You’re adding the new node as rac3
>
> The rack aware policy is going to make sure you get the rack diversity you
> asked for by making sure one replica of each partition is in rac3, which is
> going to blow up that instance
>
>
>
> --
> Jeff Jirsa
>
>
> On Oct 15, 2017, at 1:42 PM, Stefano Ortolani <ostef...@gmail.com> wrote:
>
> Hi Jeff,
>
> this my third attempt bootstrapping the node so I tried several tricks
> that might partially explain the output I am posting.
>
> * To make the bootstrap incremental, I have been throttling the streams on
> all nodes to 1Mbits. I have selectively unthrottling one node at a time
> hoping that would unlock some routines compacting away redundant data
> (you'll see that nodetool netstats reports back fewer nodes than nodetool
> status).
> * Since compactions have had the tendency of getting stuck (hundreds
> pending but none executing) in previous bootstraps, I've tried issuing a
> manual "nodetool compact" on the boostrapping node.
>
> Having said that, this is the output of the commands,
>
> Thanks a lot,
> Stefano
>
> *nodetool status*
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens       Owns    Host ID
>             Rack
> UN  X.Y.33.8   342.4 GB   256          ?
> afaae414-30cc-439d-9785-1b7d35f74529  RAC1
> UN  X.Y.81.4   325.98 GB  256          ?
> 00a96a5d-3bfd-497f-91f3-973b75146162  RAC2
> UN  X.Y.33.4   348.81 GB  256          ?
> 1d8e6588-e25b-456a-8f29-0dedc35bda8e  RAC1
> UN  X.Y.33.5   384.99 GB  256          ?
> 13d03fd2-7528-466b-b4b5-1b46508e2465  RAC1
> UN  X.Y.81.5   336.27 GB  256          ?
> aa161400-6c0e-4bde-bcb3-b2e7e7840196  RAC2
> UN  X.Y.33.6   377.22 GB  256          ?
> 43a393ba-6805-4e33-866f-124360174b28  RAC1
> UN  X.Y.81.6   329.61 GB  256          ?
> 4c3c64ae-ef4f-4986-9341-573830416997  RAC2
> UN  X.Y.33.7   344.25 GB  256          ?
> 03d81879-dc0d-4118-92e3-b3013dfde480  RAC1
> UN  X.Y.81.7   324.93 GB  256          ?
> 24bbf4b6-9427-4ed1-a751-a55cc24cc756  RAC2
> UN  X.Y.81.1   323.8 GB   256          ?
> 26244100-0565-4567-ae9c-0fc5346f5558  RAC2
> UJ  X.Y.177.2  724.5 GB   256          ?
> e269a06b-c0c0-43a6-922c-f04c98898e0d  RAC3
> UN  X.Y.81.2   337.83 GB  256          ?
> 09e29429-15ff-44d6-9742-ac95c83c4d9e  RAC2
> UN  X.Y.81.3   326.4 GB   256          ?
> feaa7b27-7ab8-4fc2-b64a-c9df3dd86d97  RAC2
> UN  X.Y.33.3   350.4 GB   256          ?
> cc115991-b7e7-4d06-87b5-8ad5efd45da5  RAC1
>
>
> *nodetool netstats -H | grep "Already received" -B 1*
>     /X.Y.81.4
>         Receiving 1992 files, 103.68 GB total. Already received 515 files,
> 23.32 GB total
> --
>     /X.Y.81.7
>         Receiving 1936 files, 89.35 GB total. Already received 554 files,
> 23.32 GB total
> --
>     /X.Y.81.5
>         Receiving 1926 files, 95.69 GB total. Already received 545 files,
> 23.31 GB total
> --
>     /X.Y.81.2
>         Receiving 1992 files, 100.81 GB total. Already received 537 files,
> 23.32 GB total
> --
>     /X.Y.81.3
>         Receiving 1958 files, 104.72 GB total. Already received 503 files,
> 23.31 GB total
> --
>     /X.Y.81.1
>         Receiving 2034 files, 104.51 GB total. Already received 520 files,
> 23.33 GB total
> --
>     /X.Y.81.6
>         Receiving 1962 files, 96.19 GB total. Already received 547 files,
> 23.32 GB total
> --
>     /X.Y.33.5
>         Receiving 2121 files, 97.44 GB total. Already received 601 files,
> 23.32 GB total
>
> *nodetool tpstats*
> Pool Name                    Active   Pending      Completed   Blocked
>  All time blocked
> MutationStage                     0         0      828367015         0
>             0
> ViewMutationStage                 0         0              0         0
>             0
> ReadStage                         0         0              0         0
>             0
> RequestResponseStage              0         0             13         0
>             0
> ReadRepairStage                   0         0              0         0
>             0
> CounterMutationStage              0         0              0         0
>             0
> MiscStage                         0         0              0         0
>             0
> CompactionExecutor                1         1          12150         0
>             0
> MemtableReclaimMemory             0         0           7368         0
>             0
> PendingRangeCalculator            0         0             14         0
>             0
> GossipStage                       0         0         599329         0
>             0
> SecondaryIndexManagement          0         0              0         0
>             0
> HintsDispatcher                   0         0              0         0
>             0
> MigrationStage                    0         0             27         0
>             0
> MemtablePostFlush                 0         0           8112         0
>             0
> ValidationExecutor                0         0              0         0
>             0
> Sampler                           0         0              0         0
>             0
> MemtableFlushWriter               0         0           7368         0
>             0
> InternalResponseStage             0         0             25         0
>             0
> AntiEntropyStage                  0         0              0         0
>             0
> CacheCleanupExecutor              0         0              0         0
>             0
>
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> HINT                         0
> MUTATION                     1
> COUNTER_MUTATION             0
> BATCH_STORE                  0
> BATCH_REMOVE                 0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
>
> *nodetool compactionstats -H*
> pending tasks: 776
>                                      id   compaction type         keyspace
>                   table   completed     total    unit   progress
>    24d039f2-b1e6-11e7-ac57-3d25e38b2f5c        Compaction   keyspace_1
> table_1     4.85 GB   7.67 GB   bytes     63.25%
> Active compaction remaining time :        n/a
>
>
> On Sun, Oct 15, 2017 at 9:27 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
>> Can you post (anonymize as needed) nodetool status, nodetool netstats,
>> nodetool tpstats, and nodetool compctionstats ?
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <ostef...@gmail.com> wrote:
>>
>> Hi Jeff,
>>
>> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256).
>>
>> Stefano
>>
>> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> What version?
>>>
>>> Single disk or JBOD?
>>>
>>> Vnodes?
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <ostef...@gmail.com>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck
>>> so far.
>>> Based on the source code it seems that this option doesn't affect
>>> compactions while bootstrapping.
>>>
>>> I am getting quite confused as it seems I am not able to bootstrap a
>>> node if I don't have at least 6/7 times the disk space used by other nodes.
>>> This is weird. The host I am bootstrapping is using a SSD. Also
>>> compaction throughput is unthrottled (set to 0) and the compacting threads
>>> are set to 8.
>>> Nevertheless, primary ranges from other nodes are being streamed, but
>>> data is never compacted away.
>>>
>>> Does anybody know anything else I could try?
>>>
>>> Cheers,
>>> Stefano
>>>
>>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <ostef...@gmail.com>
>>> wrote:
>>>
>>>> Other little update: at the same time I see the number of pending tasks
>>>> stuck (in this case at 1847); restarting the node doesn't help, so I can't
>>>> really force the node to "digest" all those compactions. In the meanwhile
>>>> the disk occupied is already twice the average load I have on other nodes.
>>>>
>>>> Feeling more and more puzzled here :S
>>>>
>>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <ostef...@gmail.com>
>>>> wrote:
>>>>
>>>>> I have been trying to add another node to the cluster (after upgrading
>>>>> to 3.0.15) and I just noticed through "nodetool netstats" that all nodes
>>>>> have been streaming to the joining node approx 1/3 of their SSTables,
>>>>> basically their whole primary range (using RF=3)?
>>>>>
>>>>> Is this expected/normal?
>>>>> I was under the impression only the necessary SSTables were going to
>>>>> be streamed...
>>>>>
>>>>> Thanks for the help,
>>>>> Stefano
>>>>>
>>>>>
>>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <k...@instaclustr.com>
>>>>> wrote:
>>>>>
>>>>>> But if it also streams, it means I'd still be under-pressure if I am
>>>>>>> not mistaken. I am under the assumption that the compactions are the
>>>>>>> by-product of streaming too many SStables at the same time, and not 
>>>>>>> because
>>>>>>> of my current write load.
>>>>>>>
>>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the
>>>>>> performance impact from the node being backed up with compactions. If you
>>>>>> haven't already, you should try disable stcs in l0 on the joining node. 
>>>>>> You
>>>>>> will likely still need to do a lot of compactions, but generally they
>>>>>> should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>>>>>
>>>>>>>  I just noticed you were mentioning L1 tables too. Why would that
>>>>>>> affect the disk footprint?
>>>>>>
>>>>>> If you've been doing a lot of STCS in L0, you generally end up with
>>>>>> some large SSTables. These will eventually have to be compacted with L1.
>>>>>> Could also be suffering the problem of streamed SSTables causing large
>>>>>> cross-level compactions in the higher levels as well.
>>>>>> 
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Bootstrapping a node fails because of compactions not keeping up

Reply via email to