Re: [DISCUSS] Nested YAML configs for new features

David Capwell Mon, 29 Nov 2021 15:44:09 -0800

>  but I would hate to repeat the mistakes of our past by evolving the config 
> in a new direction without any coherent overarching design.


At the start I asked to keep the thread local to new features, but to more 
flesh out an “overarching design” maybe we should increase the “desired” scope 
to be “feature” (and leave non-features to CASSANDRA-15234 - Standardise config 
and JVM parameters)?  Aka, do we think the following is more ideal (configs 
scoped to a feature)

hinted_handoff:
  enabled: true
  disabled_datacenters:
    - DC1
    - DC2
  max_window: 3h
  flush_period: 10s
  max_file_size: 128mb
  compression:
    class_name: LZ4Compressor
    parameters:
      a: b

track_warnings:
  enabled: true
  local_read_size:
    warn_threshold: 1mb
    abort_threshold: 10mb
  coordinator_read_size:
    warn_threshold: 5mb
    abort_threshold: 20mb


OR

# I had to rename hint configs as there was 0 consistent naming
hinted_handoff_enabled: true
hinted_handoff_disabled_datacenters:
  - 'DC1'
  - 'DC2'
hinted_handoff_max_window: 3h
hinted_handoff_max_file_size: 128mb
hinted_handoff_flush_period: 10s
hinted_handoff_compression:
  class_name: LZ4Compressor
  parameters:
    a: b

track_warnings_enabled: true
track_warnings_local_read_size_warn_threshold: 1mb
track_warnings_local_read_size_abort_threshold: 10mb
track_warnings_coordinator_read_size_warn_threshold: 5mb
track_warnings_coordinator_read_size_abort_threshold: 20mb


The main issue I have with flat structure is that we have no way to enforce 
standard naming; if you look at the hint example there were at least 3 naming 
conventions (CASSANDRA-15234 is to clean this up, but can we actually maintain 
that?).  And one of the core reasons track_warnings went nested was that 
warn/abort some times became warn/fail and threshold some times was 
thresholds…. By embracing nested structure we can actually enforce consistency, 
with flat we have no way to maintain consistency.

Additionally by embracing the nested structure we can accept a flat one as well 
(PR in CASSANDRA-17166 shows this working) if users desire it; so we get the 
consistency of nested, and the “grep” benefits of flat.


> On Nov 29, 2021, at 2:17 PM, bened...@apache.org wrote:
> 
> If we’re thinking of moving towards nested configuration, then before 
> employing the approach further we would ideally consider what a fully nested 
> config looks like for the project. Ekaterina has done a lot to clean up 
> inconsistent naming, but I would hate to repeat the mistakes of our past by 
> evolving the config in a new direction without any coherent overarching 
> design.
> 
> In case anyone missed it in the earlier discussion, this was my attempt to 
> prototype a nested config: 
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> 
> I don’t have any specific attachment to it, but settling on some approximate 
> scheme would be helpful IMO.
> 
> From: David Capwell <dcapw...@apple.com.INVALID>
> Date: Monday, 29 November 2021 at 20:38
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> What should our default example cassandra.yaml file use (flat or nested)?  
>> Currently default shows nested
> 
> Was told this statement was confusing, so trying to clarify.  At the moment 
> we do not allow a nested config to be expressed in any way outside of nesting 
> it (excluding YAML’s ability to inline objects), so if we did allow flat 
> config representation of nested configs, then this would be a brand new 
> feature; we currently show the nested structure in cassandra.yaml
> 
>> On Nov 29, 2021, at 11:58 AM, David Capwell <dcapw...@apple.com.INVALID> 
>> wrote:
>> 
>> Thanks everyone for the comments, I hope below is a good summary of all the 
>> talking points?
>> 
>> We already use nested configs (networking, seed provider, commit log/hint 
>> compression, back pressure, etc.)
>> Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
>> It would be possible to support flat versions of our configs in 
>> cassandra.yaml (in addition to the nested versions)
>> "Settings" vtable currently uses the "_" separator (example of 
>> encryption/audit log).  Switching to "." Would be a change in behavior which 
>> may impact some users
>> "." Separator for nested configs are common in other systems (yq, elastic 
>> search, etc.)
>> "Structured / nested config is easier for human eyes to read"... "Flat 
>> config is harder for human eyes but easy for simple scripts"
>> For learning what configs are enabled, cassandra.yaml isn't the best 
>> interface as it may not reflect the actual configs; we can better expose 
>> this in CQL and/or Sidecar
>> What should our default example cassandra.yaml file use (flat or nested)?  
>> Currently default shows nested
>> When projecting the Config into CQL, we may want to consider UDTs to 
>> represent the complex types
>> Current limitations in CQL make nested structures hard to work with, it may 
>> be worth wild to expand CQL support for nested structures.
>> 
>> I also took a quick stab at enhancing our cassandra.yaml logic to: 1) be 
>> reusable outside of yaml parsing, 2) support setters (we currently do, but 
>> setters must be snake case… I fixed that)…, 3) support both nested and 
>> structured, 4) support ignoring fields in a consistent way (Settings vtable 
>> will include things SnakeYAML won’t and visa-versa).
>> 
>> https://github.com/apache/cassandra/pull/1335 
>> <https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final 
>> ready to merge thing, but instead a POC to show how we can solve a lot of 
>> the core problems in a consistent and reusable manner.
>> 
>> The following cassandra.yaml was used to show both worlds would work fine in 
>> the config (and compliment each other)
>> 
>> track_warnings:
>> enabled: true
>> # nested relative to the local level (TrackWarnings)
>> coordinator_read_size.warn_threshold_kb: 1024
>> local_read_size.abort_threshold_kb: 1024
>> row_index_size:
>>   warn_threshold_kb: 1024
>>   abort_threshold_kb: 1024
>> # nested relative to the top level
>> track_warnings.coordinator_read_size.abort_threshold_kb: 42
>> 
>> For the “Settings” vtable, a new Loader interface was added to get all the 
>> properties, and Properties.flatten would turn every property into a 
>> “flatten” version (isScalar (isPrimitive or not hasSubProperties) or 
>> isCollection).  This doesn’t solve 100% of the issues that vtable has (types 
>> such as Duration would need additional translation as they are Scalar but 
>> need a translation from String -> Duration), and doesn’t solve the fact the 
>> table currently uses “_”.
>> 
>>> On Nov 29, 2021, at 10:11 AM, bened...@apache.org wrote:
>>> 
>>> I meant to imply we should improve our UDT usability to support this kind 
>>> of querying, essentially – but that if we support a simple text->property 
>>> setup we might want to offer LIKE support so we can search them (via simple 
>>> filtering, not any index) – which is actually pretty easy to provide.
>>> 
>>> I think we should aim to provide users all the facilities they need to 
>>> interact with config via vtables. If the user requires external tooling, it 
>>> suggests a weakness in CQL that we should address, and maybe help the user 
>>> in other scenario too…
>>> 
>>> From: Joseph Lynch <joe.e.ly...@gmail.com>
>>> Date: Monday, 29 November 2021 at 17:32
>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>>> On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org
>>> <bened...@apache.org> wrote:
>>>> 
>>>> Maybe we can make our query language more expressive 😊
>>>> 
>>>> We might anyway want to introduce e.g. a LIKE filtering option to 
>>>> find/discover flattened config parameters?
>>> 
>>> This sounds more complicated than just having the settings virtual
>>> table return text (dot encoded) -> text (json) and probably not even
>>> that much more useful. A full table scan on the settings table could
>>> return all top level keys (strings before the first dot) and if we
>>> just return a valid json string then users can bring their own
>>> querying capabilities via jq [1], or one line of code in almost any
>>> programming language (especially python, perl, etc ...).
>>> 
>>> Alternatively if we want to modify the grammar it seems supporting
>>> structured data querying on text fields would maybe be more preferable
>>> to LIKE since you could get what you want without a grammar change and
>>> if we could generalize to any text column it would be amazingly useful
>>> elsewhere to users. For example, we could emulate jq's query syntax in
>>> the select which is, imo, best-in-class for quickly querying into
>>> nearest structures. Assuming a key (text) -> value (json) schema:
>>> 
>>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
>>> 
>>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
>>> 
>>> To have exactly jq syntax (but harder to parse) it would be:
>>> 
>>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
>>> 
>>> Since we're not indexing the structured data in any way, filtering
>>> before selection probably doesn't give us much performance improvement
>>> as we'd still have to parse the whole text field in most cases.
>>> 
>>> -Joey
>>> 
>>> [1] https://stedolan.github.io/jq/
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Reply via email to