So if I remember right, setting compaction_throughput_per_mb to zero effectively disables throttling, which means cleanup and compaction will run as fast as the instance will allow. For normal use, I'd recommend capping that at 8 or 16.
Aaron On Thu, Feb 16, 2023 at 9:43 AM Marc Hoppins <marc.hopp...@eset.com> wrote: > Compaction_throughtput_per_mb is 0 in cassandra.yaml. Is setting it in > nodetool going to provide any increase? > > > > *From:* Durity, Sean R via user <user@cassandra.apache.org> > *Sent:* Thursday, February 16, 2023 4:20 PM > *To:* user@cassandra.apache.org > *Subject:* RE: Cleanup > > > > EXTERNAL > > Clean-up is constrained/throttled by compactionthroughput. If your system > can handle it, you can increase that throughput (nodetool > setcompactionthroughput) for the clean-up in order to reduce the total time. > > > > It is a node-isolated operation, not cluster-involved. I often run clean > up on all nodes in a DC at the same time. Think of it as compaction and > consider your cluster performance/workload/timelines accordingly. > > > > Sean R. Durity > > > > *From:* manish khandelwal <manishkhandelwa...@gmail.com> > *Sent:* Thursday, February 16, 2023 5:05 AM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: Cleanup > > > > There is no advantage of running cleanup if no new nodes are introduced. > So cleanup time should remain same when adding new nodes. Cleanup is a > local to node so network bandwidth should have no effect on reducing > cleanup time. Dont ignore cleanup > > > > There is no advantage of running cleanup if no new nodes are introduced. > So cleanup time should remain same when adding new nodes. > > > > Cleanup is a local to node so network bandwidth should have no effect on > reducing cleanup time. > > > > Dont ignore cleanup as it can cause you disks occupied without any use. > > > > You should plan to run cleanup in a lean period (low traffic). Also you > can use suboptions of keyspace and table names to plan it such a way that > I/O pressure is not much. > > > > > > Regards > > Manish > > > > On Thu, Feb 16, 2023 at 3:12 PM Marc Hoppins <marc.hopp...@eset.com> > wrote: > > Hulloa all, > > > > I read a thing re. adding new nodes where the recommendation was to run > cleanup on the nodes after adding a new node to remove redundant token > ranges. > > > > I timed this way back when we only had ~20G of data per node and it took > approx. 5 mins per node. After adding a node on Tuesday, I figured I’d run > cleanup. > > > > Per node, it is taking 6+ hours now as we have 2-2.5T per node. > > > > Should we be running cleanup regularly regardless of whether or not new > nodes have been added? Would it reduce cleanup times for when we do add > new nodes? > > If we double the network bandwidth can we effectively reduce this lengthy > cleanup? > > Maybe just ignore cleanup entirely? > > I appreciate that cleanup will increase the load but running cleanup on > one node at a time seems impractical. How many simultaneous nodes (per > rack) should we limit cleanup to? > > > > More experienced suggestions would be most appreciated. > > > Marc > > > > INTERNAL USE > >