So if I remember right, setting compaction_throughput_per_mb to zero
effectively disables throttling, which means cleanup and compaction will
run as fast as the instance will allow.  For normal use, I'd recommend
capping that at 8 or 16.

Aaron


On Thu, Feb 16, 2023 at 9:43 AM Marc Hoppins <marc.hopp...@eset.com> wrote:

> Compaction_throughtput_per_mb is 0 in cassandra.yaml. Is setting it in
> nodetool going to provide any increase?
>
>
>
> *From:* Durity, Sean R via user <user@cassandra.apache.org>
> *Sent:* Thursday, February 16, 2023 4:20 PM
> *To:* user@cassandra.apache.org
> *Subject:* RE: Cleanup
>
>
>
> EXTERNAL
>
> Clean-up is constrained/throttled by compactionthroughput. If your system
> can handle it, you can increase that throughput (nodetool
> setcompactionthroughput) for the clean-up in order to reduce the total time.
>
>
>
> It is a node-isolated operation, not cluster-involved. I often run clean
> up on all nodes in a DC at the same time. Think of it as compaction and
> consider your cluster performance/workload/timelines accordingly.
>
>
>
> Sean R. Durity
>
>
>
> *From:* manish khandelwal <manishkhandelwa...@gmail.com>
> *Sent:* Thursday, February 16, 2023 5:05 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Cleanup
>
>
>
> There is no advantage of running cleanup if no new nodes are introduced.
> So cleanup time should remain same when adding new nodes. Cleanup is a
> local to node so network bandwidth should have no effect on reducing
> cleanup time. Dont ignore cleanup
>
>
>
> There is no advantage of running cleanup if no new nodes are introduced.
> So cleanup time should remain same when adding new nodes.
>
>
>
>  Cleanup is a local to node so network bandwidth should have no effect on
> reducing cleanup time.
>
>
>
>  Dont ignore cleanup as it can cause you disks occupied without any use.
>
>
>
>  You should plan to run cleanup in a lean period (low traffic). Also you
> can use suboptions of keyspace and table names to plan it such a way that
> I/O pressure is not much.
>
>
>
>
>
> Regards
>
> Manish
>
>
>
> On Thu, Feb 16, 2023 at 3:12 PM Marc Hoppins <marc.hopp...@eset.com>
> wrote:
>
> Hulloa all,
>
>
>
> I read a thing re. adding new nodes where the recommendation was to run
> cleanup on the nodes after adding a new node to remove redundant token
> ranges.
>
>
>
> I timed this way back when we only had ~20G of data per node and it took
> approx. 5 mins per node.  After adding a node on Tuesday, I figured I’d run
> cleanup.
>
>
>
> Per node, it is taking 6+ hours now as we have 2-2.5T per node.
>
>
>
> Should we be running cleanup regularly regardless of whether or not new
> nodes have been added?  Would it reduce cleanup times for when we do add
> new nodes?
>
> If we double the network bandwidth can we effectively reduce this lengthy
> cleanup?
>
> Maybe just ignore cleanup entirely?
>
> I appreciate that cleanup will increase the load but running cleanup on
> one node at a time seems impractical.  How many simultaneous nodes (per
> rack) should we limit cleanup to?
>
>
>
> More experienced suggestions would be most appreciated.
>
>
> Marc
>
>
>
> INTERNAL USE
>
>

Reply via email to