[DISCUSSION] User-facing API for managing Maintenance Mode

Sergey Chugunov Tue, 29 Sep 2020 08:04:25 -0700

Hello Ignite dev community,

As internal implementation of Maintenance Mode [1] is getting closer to
finish I want to discuss one more thing: user-facing API (I will use
control utility for examples) for managing it.


What should be managed?
When a node enters MM, it may start some automatic actions (like
defragmentation) or wait for a user to intervene and resolve the issue
(like in case of pds corruption).

So for manually triggered operations like pds cleanup after corruption we
should provide the user with a way to actually trigger the operation.
And for long-running automatic operations like defragmentation actions like
status and cancel are reasonable to implement.

At the same time Maintenance Mode is a supporting feature; it doesn't bring
any value by itself but enables implementation of other features.
Thus putting it at the center of API and build all commands around the main
"maintenance" command may not be right.

There are two alternatives - "*Big features deserve their own commands*"
and "*Everything should be unified*". Consider them.

Big features deserve their own commands
Here for each big feature we implement its own command. Defragmentation is
a big separate feature so why shouldn't it have its own commands to request
or cancel it?

Examples
    *control.sh defragmentation request-for-node --nodeId <node-id>
[--caches <caches list>]* - defragmentation will be started on the
particular node after its restart.
    *control.sh defragmentation status* - prints information about status
of on-going defragmentation.
    *control.sh defragmentation cancel* - cancels on-going defragmentation.

Another command - "maintenance" - will be used for more generic purposes.

Examples
    *control.sh maintenance list-records* - prints information about each
maintenance record (id and name of the record, parameters, description,
current status).
    *control.sh maintenance record-actions --id <record-id>* - prints
information about user-triggered actions available for this record (e.g.
for pds corruption record it may be "clean-corrupted-files")
    *control.sh maintenance execute-action --id <record-id> --action-name
<action name>* - triggers execution of particular action and prints results.

*Pros:*

   1. Big features like defragmentation get their own commands and more
   freedom in implementing them.
   2. It is emphasized that maintenance mode is just a supporting thing and
   not a first-class feature (it is not at the center of API).

*Cons:*

   1. Duplication of functionality. The same functions may be available via
   general maintenance command and a separate command of the feature.
   2. Information about a feature may be split into two commands. One piece
   of information is available in the "feature" command, another in the
   "maintenance" command.


Everything should be unified
We can go another way and gather all features that rely on MM under one
unified command.

API for node that is already in MM looks complete and logical, very
intuitive:
    *control.sh maintenance list-records* - output all records that have to
be resolved to finish maintenance.
    *control.sh maintenance record-actions --id <record-id>* - all actions
available for the record.
    *control.sh maintenance execute-action --id <record-id> --action-name
<action-name>* - executes action of the given name (like general actions
"status" or "delete" and more specific action "clean-corrupted-files" for
corrupted pds situation).

But API to request node to enter maintenance mode becomes more vague.
    *control.sh maintenance available-operations* - prints all operations
available to request (for instance, defragmentation).
    control.sh maintenance request-operation --id <operation-id> --params
<operation parameters> - requests given operation to start on next node
restart.
Here we have to distinguish operations that are requested automatically
(like pds corruption) and not show them to the user.

*Pros:*

   1. Single API to get information and trigger actions without any
   duplication.


*Cons:*

   1. We restrict big features by model provided by maintenance command.
   2. In this API we put maintenance in the center although it is nothing
   more than a supporting feature.
   3. API to request maintenance operations doesn't feel intuitive to me
   but more artificial.


So what do you think? What looks better and more intuitive from your
perspective?

I will be glad to hear any feedback on the subject.

As a result of this discussion I will create a ticket for implementation
and include it into IEP-53 [2]

[1] https://issues.apache.org/jira/browse/IGNITE-13366
[2]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-53%3A+Maintenance+Mode

[DISCUSSION] User-facing API for managing Maintenance Mode

Reply via email to