Hello everyone,

It seems to me that we need another consensus to make the
SettingsTable virtual table updatable. There is an issue with
validating configuration properties that blocks our implementation
with the virtual table.

A short example of validating the values loaded from the YAML file:
- the DurationSpec.LongMillisecondsBound itself requires input quantity >= 0;
- the read_request_timeout Config field with type
DurationSpec.LongMillisecondsBound requires quantity >=
LOWEST_ACCEPTED_TIMEOUT (10ms);

When the read_request_timeout field is modified using JMX, only a
DurationSpec.LongMillisecondsBound type validation is performed and
there is no LOWEST_ACCEPTED_TIMEOUT validation. If we implement the
SettingsTable properties validation in the same way, we just add
another discrepancy.


If we go a little deeper, we are currently validating a configuration
property in the following parts of the code, which makes things even
worse:
- in a property type itself if it's not primitive, e.g.
DataStorageSpec#validateQuantity;
- rarely in nested configuration classes e.g.
AuditLogOptions#validateCategories;
- during the configuration load from yaml-file for null, and non-null,
see YamlConfigurationLoader.PropertiesChecker#check;
- during applying the configuration, e.g. DatabaseDescriptor#applySimpleConfig;
- in DatabaseDescriptor setter methods e.g.
DatabaseDescriptor#setDenylistMaxKeysTotal;
- inside custom classes e.g. SSLFactory#validateSslContext;
- rarely inside JMX methods itself, e.g. StorageService#setRepairRpcTimeout;


To use the same validation path for configuration properties that are
going to be changed through SettingsTable, we need to arrange a common
validation process for each property to rely on, so that the
validation path will be the same regardless of the public interface
used (YAML, JMX, or Virtual Table).

In general, I'd like to avoid building a complex validation framework
for Cassandra's configuration, as the purpose of the project is not to
maintain the configuration itself, so the simpler the validation of
the properties will be, the easier the configuration will be to
maintain.


We might have the following options for building the validation
process, and each of them has its pros and cons:

== 1. ==

Add new annotations to build the property's validation rules (as it
was suggested by David)
@Max, @Min, @NotNull, @Size, @Nullable (already have this one), as
well as custom validators etc.

@Min(5.0) @Max(16.0)
public volatile double phi_convict_threshold = 8.0;

An example of such an approach is the Dropwizard Configuration library
(or Hibernate, Spring)
https://www.dropwizard.io/en/latest/manual/validation.html#annotations


== 2. ==

Add to the @Mutable the set (or single implementation) of validations
it performs, which is closer to what we have now.
As an alternative to having a single class for each constraint, we can
have an enumeration list that stores the same implementations.

public @interface Mutable {
  Class<? extends ConfigurationConstraint<?>>[] constraints() default {};
}

public class NotNullConstraint implements ConfigurationConstraint<Object> {
    public void validate(Object newValue) {
        if (newValue == null)
            throw new IllegalArgumentException("Value cannot be null");
    }
}

public class PositiveConstraint implements ConfigurationConstraint<Object> {
    public void validate(Object newValue) {
        if (newValue instanceof Number && ((Number) newValue).intValue() <= 0)
            throw new IllegalArgumentException("Value must be positive");
    }
}

@Mutable(constraints = { NotNullConstraint.class, PositiveConstraint.class })
public volatile Integer concurrent_compactors;

Something similar is performed for Custom Constraints in Hibernate.
https://docs.jboss.org/hibernate/stable/validator/reference/en-US/html_single/#section-constraint-validator


== 3. ==

Enforce setter method names to match the corresponding property name.
This will allow us to call the setter method with reflection, and the
method itself will do all the validation we need.

public volatile int key_cache_keys_to_save;

public static void setKeyCacheKeysToSave(int keyCacheKeysToSave)
{
    if (keyCacheKeysToSave < 0)
        throw new IllegalArgumentException("key_cache_keys_to_save
must be positive");
    conf.key_cache_keys_to_save = keyCacheKeysToSave;
}


I think the options above are the most interesting for us, but if you
have something more appropriate, please share. From my point of view,
option 2 is the most appropriate here, as it fits with everything we
have discussed in this thread. However, they are all fine to go with.

I'm looking forward to hearing your thoughts.

On Tue, 21 Feb 2023 at 22:06, Maxim Muzafarov <mmu...@apache.org> wrote:
>
> Hello everyone,
>
>
> I would like to share and discuss the key point of the solution design
> with you before I finalise a pull request with tedious changes
> remaining so that we are all on the same page with the changes to the
> valuable Config class and its accessors.
>
> Here is the issue I'm working on:
> "Allow UPDATE on settings virtual table to change running configurations".
> https://issues.apache.org/jira/browse/CASSANDRA-15254
>
> Below is the restricted solution design at a very high level, all the
> details have been discussed in the related JIRA issue.
>
>
> = What we have now =
>
> - We use JMX MBeans to mutate this runtime configuration during the
> node run or to view the configuration values. Some of the JMX MBean
> methods use camel case to match configuration field names;
> - We use the SettingsTable only to view configuration values at
> runtime, but we are not able to mutate the configuration through it;
> - We load the configuration from cassandra.yaml into the Config class
> instance during the node bootstrap (is accessed with
> DatabaseDescriptor, GuardrailsOptions);
> - The Config class itself has nested configurations such as
> ReplicaFilteringProtectionOptions (it is important to keep this always
> in mind);
>
>
> = What we want to achieve =
>
> We want to use the SettingsTable virtual table to change the runtime
> configuration, as we do it now with JMX MBeans, and:
> - If the affected component is updated (or the component's logic is
> executed) before or after the property change, we want to keep this
> behaviour for the virtual table for the same configuration property;
> - We want to ensure consistency of such changes between the virtual
> table API and the JMX API used;
>
>
> = The main question =
>
> To enable configuration management with the virtual table, we need to
> know the answer to the following key question:
> - How can we be sure to determine at runtime which of the properties
> we can change and which we can't?
>
>
> = Options for an answer to the question above =
>
> 1. Rely on the volatile keyword in front of fields in the Config class;
>
> I would say this is the most confusing option for me because it
> doesn't give us all the guarantees we need, and also:
> - We have no explicit control over what exactly we expose to a user.
> When we modify the JMX API, we're implementing a new method for the
> MBean, which in turn makes this action an explicit exposure;
> - The volatile keyword is not the only way to achieve thread safety,
> and looks strange for the public API design point;
> - A good example is the setEnableDropCompactStorage method, which
> changes the volatile field, but is only visible for testing purposes;
>
> 2. Annotation-based exposition.
>
> I have created Exposure(Exposure.Policy.READ_ONLY),
> Exposure(Exposure.Policy.READ_WRITE) annotations to mark all the
> configuration fields we are going to expose to the public API (JMX, as
> well as the SettingsTable) in the Config class. All the configuration
> fields (in the Config class and any nested classes) that we want to
> expose (and already are used by JMX) need to tag with an annotation of
> the appropriate type.
>
> The most confusing thing here, apart from the number of tedious
> changes: we are using reflection to mutate configuration field values
> at runtime, which makes some of the fields look "unused" in the IDE.
> This can be not very pleasant for developers looking at the Config
> class for the first time.
>
> You can find the PR related to this type of change here (only a few
> configuration fields have been annotated for the visibility of all
> changes):
> https://github.com/apache/cassandra/pull/2133/files
>
>
> 3. Enforce setter/getter method name rules by converting these methods
> in camel case to the field name with underscores.
>
> To rely on setter methods, we need to enforce the naming rules of the
> setters. I have collected information about which field names match
> their camel case getter/setter methods:
>
> total: 345
> setters: 109, missed 236
> volatile setters: 90, missed 255
> jmx setters: 35, missed 310
> getters: 139, missed 206
> volatile getters: 107, missed 238
> jmx getters: 63, missed 282
>
> The most confusing part of this type of change is the number of
> changes in additional classes according to the calculation above and
> some difficulties with enforcing this rule for nested configuration
> classes.
>
> Find out what this change is about here:
> https://github.com/apache/cassandra/pull/2172/files
>
>
> = Summary =
>
> In summary, from my point of view, the annotation approach will be the
> most robust solution for us, so I'd like to continue with it. It also
> provides an easy way to extend the SettingTable with additional
> columns such as runtime type (READ_ONLY, READ_WRITE) and a description
> column. This ends up looking much more user-friendly.
>
> Another advantage of the annotation approach is that we can rely on
> this annotation to generate dedicated dynamic JMX beans that only
> respond to node configuration management to avoid any inconsistencies
> like those mentioned here [2] (I have described a similar approach
> here [1], but for metrics). But all this is beyond the scope of the
> current changes.
>
> Looking forward to your thoughts.
>
>
> [1] https://lists.apache.org/thread/26j9hhy39okw0wy79mtylb753w6xjclg
> [2] https://issues.apache.org/jira/browse/CASSANDRA-17734

Reply via email to