Nice catch! I’ve totally overlooked it. Thanks a lot! Stefano
On Sun, 15 Oct 2017 at 22:14, Jeff Jirsa <jji...@gmail.com> wrote: > (Should still be able to complete, unless you’re running out of disk or > memory or similar, but that’s why it’s streaming more than you expect) > > > -- > Jeff Jirsa > > > On Oct 15, 2017, at 1:51 PM, Jeff Jirsa <jji...@gmail.com> wrote: > > I > You’re adding the new node as rac3 > > The rack aware policy is going to make sure you get the rack diversity you > asked for by making sure one replica of each partition is in rac3, which is > going to blow up that instance > > > > -- > Jeff Jirsa > > > On Oct 15, 2017, at 1:42 PM, Stefano Ortolani <ostef...@gmail.com> wrote: > > Hi Jeff, > > this my third attempt bootstrapping the node so I tried several tricks > that might partially explain the output I am posting. > > * To make the bootstrap incremental, I have been throttling the streams on > all nodes to 1Mbits. I have selectively unthrottling one node at a time > hoping that would unlock some routines compacting away redundant data > (you'll see that nodetool netstats reports back fewer nodes than nodetool > status). > * Since compactions have had the tendency of getting stuck (hundreds > pending but none executing) in previous bootstraps, I've tried issuing a > manual "nodetool compact" on the boostrapping node. > > Having said that, this is the output of the commands, > > Thanks a lot, > Stefano > > *nodetool status* > Datacenter: DC1 > =============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN X.Y.33.8 342.4 GB 256 ? > afaae414-30cc-439d-9785-1b7d35f74529 RAC1 > UN X.Y.81.4 325.98 GB 256 ? > 00a96a5d-3bfd-497f-91f3-973b75146162 RAC2 > UN X.Y.33.4 348.81 GB 256 ? > 1d8e6588-e25b-456a-8f29-0dedc35bda8e RAC1 > UN X.Y.33.5 384.99 GB 256 ? > 13d03fd2-7528-466b-b4b5-1b46508e2465 RAC1 > UN X.Y.81.5 336.27 GB 256 ? > aa161400-6c0e-4bde-bcb3-b2e7e7840196 RAC2 > UN X.Y.33.6 377.22 GB 256 ? > 43a393ba-6805-4e33-866f-124360174b28 RAC1 > UN X.Y.81.6 329.61 GB 256 ? > 4c3c64ae-ef4f-4986-9341-573830416997 RAC2 > UN X.Y.33.7 344.25 GB 256 ? > 03d81879-dc0d-4118-92e3-b3013dfde480 RAC1 > UN X.Y.81.7 324.93 GB 256 ? > 24bbf4b6-9427-4ed1-a751-a55cc24cc756 RAC2 > UN X.Y.81.1 323.8 GB 256 ? > 26244100-0565-4567-ae9c-0fc5346f5558 RAC2 > UJ X.Y.177.2 724.5 GB 256 ? > e269a06b-c0c0-43a6-922c-f04c98898e0d RAC3 > UN X.Y.81.2 337.83 GB 256 ? > 09e29429-15ff-44d6-9742-ac95c83c4d9e RAC2 > UN X.Y.81.3 326.4 GB 256 ? > feaa7b27-7ab8-4fc2-b64a-c9df3dd86d97 RAC2 > UN X.Y.33.3 350.4 GB 256 ? > cc115991-b7e7-4d06-87b5-8ad5efd45da5 RAC1 > > > *nodetool netstats -H | grep "Already received" -B 1* > /X.Y.81.4 > Receiving 1992 files, 103.68 GB total. Already received 515 files, > 23.32 GB total > -- > /X.Y.81.7 > Receiving 1936 files, 89.35 GB total. Already received 554 files, > 23.32 GB total > -- > /X.Y.81.5 > Receiving 1926 files, 95.69 GB total. Already received 545 files, > 23.31 GB total > -- > /X.Y.81.2 > Receiving 1992 files, 100.81 GB total. Already received 537 files, > 23.32 GB total > -- > /X.Y.81.3 > Receiving 1958 files, 104.72 GB total. Already received 503 files, > 23.31 GB total > -- > /X.Y.81.1 > Receiving 2034 files, 104.51 GB total. Already received 520 files, > 23.33 GB total > -- > /X.Y.81.6 > Receiving 1962 files, 96.19 GB total. Already received 547 files, > 23.32 GB total > -- > /X.Y.33.5 > Receiving 2121 files, 97.44 GB total. Already received 601 files, > 23.32 GB total > > *nodetool tpstats* > Pool Name Active Pending Completed Blocked > All time blocked > MutationStage 0 0 828367015 0 > 0 > ViewMutationStage 0 0 0 0 > 0 > ReadStage 0 0 0 0 > 0 > RequestResponseStage 0 0 13 0 > 0 > ReadRepairStage 0 0 0 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > MiscStage 0 0 0 0 > 0 > CompactionExecutor 1 1 12150 0 > 0 > MemtableReclaimMemory 0 0 7368 0 > 0 > PendingRangeCalculator 0 0 14 0 > 0 > GossipStage 0 0 599329 0 > 0 > SecondaryIndexManagement 0 0 0 0 > 0 > HintsDispatcher 0 0 0 0 > 0 > MigrationStage 0 0 27 0 > 0 > MemtablePostFlush 0 0 8112 0 > 0 > ValidationExecutor 0 0 0 0 > 0 > Sampler 0 0 0 0 > 0 > MemtableFlushWriter 0 0 7368 0 > 0 > InternalResponseStage 0 0 25 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > CacheCleanupExecutor 0 0 0 0 > 0 > > Message type Dropped > READ 0 > RANGE_SLICE 0 > _TRACE 0 > HINT 0 > MUTATION 1 > COUNTER_MUTATION 0 > BATCH_STORE 0 > BATCH_REMOVE 0 > REQUEST_RESPONSE 0 > PAGED_RANGE 0 > READ_REPAIR 0 > > *nodetool compactionstats -H* > pending tasks: 776 > id compaction type keyspace > table completed total unit progress > 24d039f2-b1e6-11e7-ac57-3d25e38b2f5c Compaction keyspace_1 > table_1 4.85 GB 7.67 GB bytes 63.25% > Active compaction remaining time : n/a > > > On Sun, Oct 15, 2017 at 9:27 PM, Jeff Jirsa <jji...@gmail.com> wrote: > >> Can you post (anonymize as needed) nodetool status, nodetool netstats, >> nodetool tpstats, and nodetool compctionstats ? >> >> -- >> Jeff Jirsa >> >> >> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <ostef...@gmail.com> wrote: >> >> Hi Jeff, >> >> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256). >> >> Stefano >> >> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jji...@gmail.com> wrote: >> >>> What version? >>> >>> Single disk or JBOD? >>> >>> Vnodes? >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <ostef...@gmail.com> >>> wrote: >>> >>> Hi all, >>> >>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck >>> so far. >>> Based on the source code it seems that this option doesn't affect >>> compactions while bootstrapping. >>> >>> I am getting quite confused as it seems I am not able to bootstrap a >>> node if I don't have at least 6/7 times the disk space used by other nodes. >>> This is weird. The host I am bootstrapping is using a SSD. Also >>> compaction throughput is unthrottled (set to 0) and the compacting threads >>> are set to 8. >>> Nevertheless, primary ranges from other nodes are being streamed, but >>> data is never compacted away. >>> >>> Does anybody know anything else I could try? >>> >>> Cheers, >>> Stefano >>> >>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <ostef...@gmail.com> >>> wrote: >>> >>>> Other little update: at the same time I see the number of pending tasks >>>> stuck (in this case at 1847); restarting the node doesn't help, so I can't >>>> really force the node to "digest" all those compactions. In the meanwhile >>>> the disk occupied is already twice the average load I have on other nodes. >>>> >>>> Feeling more and more puzzled here :S >>>> >>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <ostef...@gmail.com> >>>> wrote: >>>> >>>>> I have been trying to add another node to the cluster (after upgrading >>>>> to 3.0.15) and I just noticed through "nodetool netstats" that all nodes >>>>> have been streaming to the joining node approx 1/3 of their SSTables, >>>>> basically their whole primary range (using RF=3)? >>>>> >>>>> Is this expected/normal? >>>>> I was under the impression only the necessary SSTables were going to >>>>> be streamed... >>>>> >>>>> Thanks for the help, >>>>> Stefano >>>>> >>>>> >>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <k...@instaclustr.com> >>>>> wrote: >>>>> >>>>>> But if it also streams, it means I'd still be under-pressure if I am >>>>>>> not mistaken. I am under the assumption that the compactions are the >>>>>>> by-product of streaming too many SStables at the same time, and not >>>>>>> because >>>>>>> of my current write load. >>>>>>> >>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the >>>>>> performance impact from the node being backed up with compactions. If you >>>>>> haven't already, you should try disable stcs in l0 on the joining node. >>>>>> You >>>>>> will likely still need to do a lot of compactions, but generally they >>>>>> should be smaller. The option is -Dcassandra.disable_stcs_in_l0=true >>>>>> >>>>>>> I just noticed you were mentioning L1 tables too. Why would that >>>>>>> affect the disk footprint? >>>>>> >>>>>> If you've been doing a lot of STCS in L0, you generally end up with >>>>>> some large SSTables. These will eventually have to be compacted with L1. >>>>>> Could also be suffering the problem of streamed SSTables causing large >>>>>> cross-level compactions in the higher levels as well. >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >