Re: Bootstrapping a node fails because of compactions not keeping up

Stefano Ortolani Sun, 15 Oct 2017 13:43:26 -0700

Hi Jeff,

this my third attempt bootstrapping the node so I tried several tricks that
might partially explain the output I am posting.


* To make the bootstrap incremental, I have been throttling the streams on
all nodes to 1Mbits. I have selectively unthrottling one node at a time
hoping that would unlock some routines compacting away redundant data
(you'll see that nodetool netstats reports back fewer nodes than nodetool
status).
* Since compactions have had the tendency of getting stuck (hundreds
pending but none executing) in previous bootstraps, I've tried issuing a
manual "nodetool compact" on the boostrapping node.

Having said that, this is the output of the commands,

Thanks a lot,
Stefano

*nodetool status*
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID
            Rack
UN  X.Y.33.8   342.4 GB   256          ?
afaae414-30cc-439d-9785-1b7d35f74529
 RAC1
UN  X.Y.81.4   325.98 GB  256          ?
00a96a5d-3bfd-497f-91f3-973b75146162
 RAC2
UN  X.Y.33.4   348.81 GB  256          ?
1d8e6588-e25b-456a-8f29-0dedc35bda8e
 RAC1
UN  X.Y.33.5   384.99 GB  256          ?
13d03fd2-7528-466b-b4b5-1b46508e2465
 RAC1
UN  X.Y.81.5   336.27 GB  256          ?
aa161400-6c0e-4bde-bcb3-b2e7e7840196
 RAC2
UN  X.Y.33.6   377.22 GB  256          ?
43a393ba-6805-4e33-866f-124360174b28
 RAC1
UN  X.Y.81.6   329.61 GB  256          ?
4c3c64ae-ef4f-4986-9341-573830416997
 RAC2
UN  X.Y.33.7   344.25 GB  256          ?
03d81879-dc0d-4118-92e3-b3013dfde480
 RAC1
UN  X.Y.81.7   324.93 GB  256          ?
24bbf4b6-9427-4ed1-a751-a55cc24cc756
 RAC2
UN  X.Y.81.1   323.8 GB   256          ?
26244100-0565-4567-ae9c-0fc5346f5558
 RAC2
UJ  X.Y.177.2  724.5 GB   256          ?
e269a06b-c0c0-43a6-922c-f04c98898e0d
 RAC3
UN  X.Y.81.2   337.83 GB  256          ?
09e29429-15ff-44d6-9742-ac95c83c4d9e
 RAC2
UN  X.Y.81.3   326.4 GB   256          ?
feaa7b27-7ab8-4fc2-b64a-c9df3dd86d97
 RAC2
UN  X.Y.33.3   350.4 GB   256          ?
cc115991-b7e7-4d06-87b5-8ad5efd45da5
 RAC1


*nodetool netstats -H | grep "Already received" -B 1*
    /X.Y.81.4
        Receiving 1992 files, 103.68 GB total. Already received 515 files,
23.32 GB total
--
    /X.Y.81.7
        Receiving 1936 files, 89.35 GB total. Already received 554 files,
23.32 GB total
--
    /X.Y.81.5
        Receiving 1926 files, 95.69 GB total. Already received 545 files,
23.31 GB total
--
    /X.Y.81.2
        Receiving 1992 files, 100.81 GB total. Already received 537 files,
23.32 GB total
--
    /X.Y.81.3
        Receiving 1958 files, 104.72 GB total. Already received 503 files,
23.31 GB total
--
    /X.Y.81.1
        Receiving 2034 files, 104.51 GB total. Already received 520 files,
23.33 GB total
--
    /X.Y.81.6
        Receiving 1962 files, 96.19 GB total. Already received 547 files,
23.32 GB total
--
    /X.Y.33.5
        Receiving 2121 files, 97.44 GB total. Already received 601 files,
23.32 GB total

*nodetool tpstats*
Pool Name                    Active   Pending      Completed   Blocked  All
time blocked
MutationStage                     0         0      828367015         0
            0
ViewMutationStage                 0         0              0         0
            0
ReadStage                         0         0              0         0
            0
RequestResponseStage              0         0             13         0
            0
ReadRepairStage                   0         0              0         0
            0
CounterMutationStage              0         0              0         0
            0
MiscStage                         0         0              0         0
            0
CompactionExecutor                1         1          12150         0
            0
MemtableReclaimMemory             0         0           7368         0
            0
PendingRangeCalculator            0         0             14         0
            0
GossipStage                       0         0         599329         0
            0
SecondaryIndexManagement          0         0              0         0
            0
HintsDispatcher                   0         0              0         0
            0
MigrationStage                    0         0             27         0
            0
MemtablePostFlush                 0         0           8112         0
            0
ValidationExecutor                0         0              0         0
            0
Sampler                           0         0              0         0
            0
MemtableFlushWriter               0         0           7368         0
            0
InternalResponseStage             0         0             25         0
            0
AntiEntropyStage                  0         0              0         0
            0
CacheCleanupExecutor              0         0              0         0
            0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     1
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

*nodetool compactionstats -H*
pending tasks: 776
                                     id   compaction type         keyspace
                  table   completed     total    unit   progress
   24d039f2-b1e6-11e7-ac57-3d25e38b2f5c        Compaction   keyspace_1
table_1     4.85 GB   7.67 GB   bytes     63.25%
Active compaction remaining time :        n/a


On Sun, Oct 15, 2017 at 9:27 PM, Jeff Jirsa <jji...@gmail.com> wrote:

> Can you post (anonymize as needed) nodetool status, nodetool netstats,
> nodetool tpstats, and nodetool compctionstats ?
>
> --
> Jeff Jirsa
>
>
> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <ostef...@gmail.com> wrote:
>
> Hi Jeff,
>
> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256).
>
> Stefano
>
> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
>> What version?
>>
>> Single disk or JBOD?
>>
>> Vnodes?
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <ostef...@gmail.com>
>> wrote:
>>
>> Hi all,
>>
>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so
>> far.
>> Based on the source code it seems that this option doesn't affect
>> compactions while bootstrapping.
>>
>> I am getting quite confused as it seems I am not able to bootstrap a node
>> if I don't have at least 6/7 times the disk space used by other nodes.
>> This is weird. The host I am bootstrapping is using a SSD. Also
>> compaction throughput is unthrottled (set to 0) and the compacting threads
>> are set to 8.
>> Nevertheless, primary ranges from other nodes are being streamed, but
>> data is never compacted away.
>>
>> Does anybody know anything else I could try?
>>
>> Cheers,
>> Stefano
>>
>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <ostef...@gmail.com>
>> wrote:
>>
>>> Other little update: at the same time I see the number of pending tasks
>>> stuck (in this case at 1847); restarting the node doesn't help, so I can't
>>> really force the node to "digest" all those compactions. In the meanwhile
>>> the disk occupied is already twice the average load I have on other nodes.
>>>
>>> Feeling more and more puzzled here :S
>>>
>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <ostef...@gmail.com>
>>> wrote:
>>>
>>>> I have been trying to add another node to the cluster (after upgrading
>>>> to 3.0.15) and I just noticed through "nodetool netstats" that all nodes
>>>> have been streaming to the joining node approx 1/3 of their SSTables,
>>>> basically their whole primary range (using RF=3)?
>>>>
>>>> Is this expected/normal?
>>>> I was under the impression only the necessary SSTables were going to be
>>>> streamed...
>>>>
>>>> Thanks for the help,
>>>> Stefano
>>>>
>>>>
>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <k...@instaclustr.com>
>>>> wrote:
>>>>
>>>>> But if it also streams, it means I'd still be under-pressure if I am
>>>>>> not mistaken. I am under the assumption that the compactions are the
>>>>>> by-product of streaming too many SStables at the same time, and not 
>>>>>> because
>>>>>> of my current write load.
>>>>>>
>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the
>>>>> performance impact from the node being backed up with compactions. If you
>>>>> haven't already, you should try disable stcs in l0 on the joining node. 
>>>>> You
>>>>> will likely still need to do a lot of compactions, but generally they
>>>>> should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>>>>
>>>>>>  I just noticed you were mentioning L1 tables too. Why would that
>>>>>> affect the disk footprint?
>>>>>
>>>>> If you've been doing a lot of STCS in L0, you generally end up with
>>>>> some large SSTables. These will eventually have to be compacted with L1.
>>>>> Could also be suffering the problem of streamed SSTables causing large
>>>>> cross-level compactions in the higher levels as well.
>>>>> 
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Bootstrapping a node fails because of compactions not keeping up

Reply via email to