quietness of full nodetool repair on large dataset

Mitch Gitman Thu, 28 Sep 2017 20:24:32 -0700

I'm on Apache Cassandra 3.10. I'm interested in moving over to Reaper for
repairs, but in the meantime, I want to get nodetool repair working a
little more gracefully.


What I'm noticing is that, when I'm running a repair for the first time
with the --full option after a large initial load of data, the client will
say it's starting on a repair job and then cease to produce any output for
not just minutes but a few hours. This causes SSH inactivity timeouts. I
have tried running the repair with the --trace option, but then that leads
to the other extreme where there's just a torrent of output, scarcely any
of which I'll typically need.

As a literal solution to my SSH inactivity timeouts, I could extend the
timeouts, or I could do some scripting jujitsu with
StrictHostKeyChecking=no and a loop that spits some arbitrary output until
the command finishes. But even if the timeouts were no concern, the sheer
unresponsiveness is apt to make an operator nervous. And I'd like to think
there's a Goldilocks way to run a full nodetool repair on a large dataset
where it's just a bit more responsive without going all TMI. Thoughts?
Anyone else notice this?

quietness of full nodetool repair on large dataset

Reply via email to