Jaydeep,
  your replies address my main concerns, there's a few questions of
curiosity as replies inline below…





> >Without any per-table scheduling and history (IIUC)  a node would have
> to restart the repairs for all keyspaces and tables.
>
> The above-mentioned quote should work fine and will make sure the bad
> tables/keyspaces are skipped, allowing the good keyspaces/tables to proceed
> on a node as long as the Cassandra JVM itself keeps crashing. If a JVM
> keeps crashing, then it will restart all over again, but then fixing the
> JVM crashing might be a more significant issue and does not happen
> regularly, IMO.
>


Repairs causing a node to OOM is not unusual.   I've been working with a
customer in this situation the past few weeks.  Getting fixes out, or
mitigating the problem, is not always as quick as one hopes (see my
previous comment about how the repair_session_size setting gets easily
clobbered today).  This situation would be much improved with table
priority and tracking is added to the system_distributed table(s).



> If an admin sets some nodes on a priority queue, those nodes will be
> repaired over the scheduler's own list. If an admin tags some nodes on the
> emergency list, then those nodes will repair immediately. Basically, an
> admin tells the scheduler, "*Just do what I say instead of using your
> list of nodes*".
>


Does this emergency list imply then not doing --partitioner-range ?


>I am also curious as to how the impact of these tables changes as we
> address (1) and (2).
>
> Quite a lot of (1) & (2) can be addressed by just adding a new CQL
> property, which won't even touch these metadata tables. In case we need to,
> depending on the design for (1) & (2), it can be either addressed by adding
> new columns and/or adding a new metadata table.
>

For per-table custom-priorities and tracking it sounds like adding a
clustering column.  So the number of records would go from ~number of nodes
in the cluster, to ~number of nodes multiplied by up to the number of
tables in the cluster.  We do see clusters too often with up to a thousand
tables, despite strong recommendations not to go over two hundred.  Do you
see here any concern ?

Also, in what versions will we be able to introduce such improvements ? We
will be waiting until the next major release ?  Playing around with the
schema of system tables in release branches is not much fun.

Reply via email to