Re: [DISCUSS] Nested YAML configs for new features

Benjamin Lerer Mon, 29 Nov 2021 07:54:33 -0800

I do not think that supporting both options is an issue. The settings
virtual table would have to use the flattened version.
If we support both formats, the question would be: what should be the one
used by default in the configuration file?


Le ven. 26 nov. 2021 à 15:40, bened...@apache.org <bened...@apache.org> a
écrit :

> This is the approach I favour for config files also. We had a much less
> engaged discussion on this topic only a few months ago, so glad to see more
> people getting involved now.
>
> I would however personally prefer to see the configuration file slowly
> deprecated (if perhaps never retired), in favour of virtual tables, so that
> operators may easily set configurations for the entire cluster. Ideally it
> would be possible to specify configuration per cluster, per DC and per
> node, with the most specific configuration applying I would like to see a
> similar hierarchy for Keyspace, Table and Per-Query options. Ideally only
> the barest minimum number of options would be necessary to supply in a
> config file, and only on first launch – seed nodes, for instance.
>
> So whatever design we employ here, we should IMO be aiming for it to be
> compatible with a CQL representation also.
>
>
> From: Bowen Song <bo...@bso.ng.INVALID>
> Date: Wednesday, 24 November 2021 at 18:15
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> Since you mentioned ElasticSearch, I'm actually pretty happy with their
> config file syntax. It allows the user to completely flatten out the
> entire config file. To give people who isn't familiar with ElasticSearch
> an idea, here is a config file we use:
>
>     cluster.name: foobar
>
>     node.remote_cluster_client: false
>     node.name: "foo.example.com"
>     node.master: true
>     node.data: true
>     node.ingest: true
>     node.ml: false
>
>     xpack.ml.enabled: false
>     xpack.security.enabled: false
>     xpack.security.audit.enabled: false
>     xpack.watcher.enabled: false
>
>     action.auto_create_index: "+.,-*"
>
>     network.host: _global_
>
>     discovery.zen.hosts_provider: file
>     discovery.zen.minimum_master_nodes: 2
>
>     http.publish_host: "foo.example.com"
>     http.publish_port: 443
>     http.bind_host: 127.0.0.1
>
>     transport.publish_host: "bar.example.com"
>     transport.bind_host: 0.0.0.0
>
>     indices.fielddata.cache.size: 1GB
>     indices.breaker.total.use_real_memory: false
>
>     path.logs: /var/log/elasticsearch
>     path.data: /var/lib/elasticsearch/data
>
> As you can see we can use the flat (grep-able) syntax for everything.
> This is also human readable because we can group options together by
> inserting empty lines between them.
>
> The equivalent of the above in a structured syntax will be:
>
>     cluster:
>          name: foobar
>
>     node:
>          remote_cluster_client: false
>          name: "foo.example.com"
>          master: true
>          data: true
>          ingest: true
>          ml: false
>
>     xpack:
>          ml:
>              enabled: false
>          security:
>              enabled: false
>              audit:
>                  enabled: false
>          watcher:
>              enabled: false
>
>     action:
>          auto_create_index: "+.,-*"
>
>     network:
>          host: _global_
>
>     discovery:
>          zen:
>              hosts_provider: file
>              minimum_master_nodes: 2
>
>     http:
>          publish_host: "foo.example.com"
>          publish_port: 443
>          bind_host: 127.0.0.1
>
>     transport:
>          publish_host: "bar.example.com"
>          bind_host: 0.0.0.0
>
>     indices:
>          fielddata:
>              cache:
>                  size: 1GB
>     indices:
>          breaker:
>              total:
>                  use_real_memory: false
>
>     path:
>          logs: /var/log/elasticsearch
>          data: /var/lib/elasticsearch/data
>
> This may be easier to read for some people, but it is a total nightmare
> for "grep" - so many keys have identical names, such as "enabled".
>
> Also, for the virtual tables, it would be a lot easier to represent
> individual values in a virtual table when the config is flat and keys
> are unique. The virtual tables would need to either support the encoding
> and decoding of the structured config into a flat structure, or use JSON
> encoded string value. The use of JSON would make querying individual
> value much harder.
>
> On 22/11/2021 16:16, Joseph Lynch wrote:
> > Isn't one of the primary reasons to have a YAML configuration instead
> > of a properties file is to allow typed and structured (implies nested)
> > configuration? I think it makes a lot of sense to group related
> > configuration options (e.g. a feature) into a typed class when we're
> > talking about more than one or two related options.
> >
> > It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
> > period encoded key->value pairs when required (usually when providing
> > a property or override layer), Spring and Elasticsearch yamls both
> > come to mind. It seems pretty reasonable to support dot encoding and
> > decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
> >
> > Regarding quickly telling what configuration a node is running I think
> > we should lean on virtual tables for "what is the current
> > configuration" now that we have them, as others have said the written
> > cassandra.yaml is not necessarily the current configuration ... and
> > also grep -C or -A exist for this reason.
> >
> > -Joey
> >
> > On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<ble...@apache.org>
> wrote:
> >> I do not have a strong opinion for one or the other but wanted to raise
> the
> >> issue I see with the "Settings" virtual table.
> >>
> >> Currently the "Settings" virtual table converts nested options into flat
> >> options using a "_" separator. For those options it allows a user to
> query
> >> the all set of options through some hack.
> >> If we decide to move to more nesting (more than one level), it seems to
> me
> >> that we need to change the way this table is behaving and how we can
> query
> >> its data.
> >>
> >> We would need to start using "." as a nesting separator to ensure that
> >> things are consistent between the configuration and the table and add
> >> support for LIKE restrictions for filtering queries to allow operators
> to
> >> be able to select the precise set of settings that the operator is
> looking
> >> for.
> >>
> >> Doing so is not really complicated in itself but might impact some
> users.
> >>
> >> Le ven. 19 nov. 2021 à 22:39, David Capwell<dcapw...@apple.com.invalid>
> a
> >> écrit :
> >>
> >>>> it is really handy to grep
> >>>> cassandra.yaml on some config key and you know the value instantly.
> >>> You can still do that
> >>>
> >>> $ grep -A2 coordinator_read_size conf/cassandra.yaml
> >>> #     coordinator_read_size:
> >>> #         warn_threshold_kb: 0
> >>> #         abort_threshold_kb: 0
> >>>
> >>> I was also arguing we should support nested and flat, so if your infra
> >>> works better with flat then you could use
> >>>
> >>> track_warnings.coordinator_read_size.warn_threshold_kb: 0
> >>> track_warnings.coordinator_read_size.abort_threshold_kb: 0
> >>>
> >>>> On Nov 19, 2021, at 1:34 PM, David Capwell<dcapw...@apple.com>
> wrote:
> >>>>
> >>>>> With the flat structure it turns into properties file - would it be
> >>>>> possible to support both formats - nested yaml and flat properties?
> >>>>
> >>>> For majority of our configs yes, but there are a subset where flat
> >>> properties is annoying
> >>>> hinted_handoff_disabled_datacenters - set type, so you could do
> >>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
> >>> with separators as the format doesn’t support
> >>>> seed_provider.parameters - this is a map type… so would need to do
> >>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we
> special
> >>> case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We
> have
> >>> ParameterizedClass all over the code
> >>>> So, as long as we define how to deal with java collections; we could
> in
> >>> theory support properties files (not arguing for that in this thread)
> as
> >>> well as system properties.
> >>>>
> >>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> >>> lewandowski.ja...@gmail.com> wrote:
> >>>>> With the flat structure it turns into properties file - would it be
> >>>>> possible to support both formats - nested yaml and flat properties?
> >>>>>
> >>>>>
> >>>>> - - -- --- ----- -------- -------------
> >>>>> Jacek Lewandowski
> >>>>>
> >>>>>
> >>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> >>> calebrackli...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> If it's nested, "track_warnings" would still work if you're grepping
> >>> around
> >>>>>> vim or less.
> >>>>>>
> >>>>>> I'd have to concede the point about grep output, although there are
> >>> tools
> >>>>>> likehttps://github.com/kislyuk/yq  that could probably be bent to
> do
> >>> what
> >>>>>> you want.
> >>>>>>
> >>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> >>>>>> stefan.mikloso...@instaclustr.com> wrote:
> >>>>>>
> >>>>>>> Hi David,
> >>>>>>>
> >>>>>>> while I do not oppose nested structure, it is really handy to grep
> >>>>>>> cassandra.yaml on some config key and you know the value instantly.
> >>>>>>> This is not possible when it is nested (easily & fastly) as it is
> on
> >>>>>>> two lines. Or maybe my grepping is just not advanced enough to
> cover
> >>>>>>> this case? If it is flat, I can just grep "track_warnings" and I
> have
> >>>>>>> them all.
> >>>>>>>
> >>>>>>> Can you elaborate on your last bullet point? Parsing layer ...
> What do
> >>>>>>> you mean specifically?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<dcapw...@gmail.com>
> >>> wrote:
> >>>>>>>> This has been brought up in a few tickets, so pushing to the dev
> >>> list.
> >>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
> >>>>>>>> CASSANDRA-16896 - hard/soft limits for queries
> >>>>>>>> CASSANDRA-17147 - Guardrails prototype
> >>>>>>>>
> >>>>>>>> In short, do we as a project wish to move "new features" into
> nested
> >>>>>>>> YAML when the feature has "enough" to justify the nesting?  I
> would
> >>>>>>>> really like to focus this discussion on new features rather than
> >>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as
> there is
> >>>>>>>> already a place to talk about that.
> >>>>>>>>
> >>>>>>>> To get things started, let's start with the track-warning feature
> >>>>>>>> (hard/soft limits for queries), currently the configs look as
> follows
> >>>>>>>> (assuming 15234)
> >>>>>>>>
> >>>>>>>> track_warnings:
> >>>>>>>>    enabled: true
> >>>>>>>>    coordinator_read_size:
> >>>>>>>>        warn_threshold: 10kb
> >>>>>>>>        abort_threshold: 1mb
> >>>>>>>>    local_read_size:
> >>>>>>>>        warn_threshold: 10kb
> >>>>>>>>        abort_threshold: 1mb
> >>>>>>>>    row_index_size:
> >>>>>>>>        warn_threshold: 100mb
> >>>>>>>>        abort_threshold: 1gb
> >>>>>>>>
> >>>>>>>> or should this be "flat"
> >>>>>>>>
> >>>>>>>> track_warnings_enabled: true
> >>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
> >>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
> >>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb
> >>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb
> >>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb
> >>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb
> >>>>>>>>
> >>>>>>>> For me I prefer nested for a few reasons
> >>>>>>>> * easier to enforce consistency as the configs can use shared
> types;
> >>>>>>>> in the track warnings patch I had mismatches cross configs (warn
> vs
> >>>>>>>> warns, fail vs abort, etc.) before going nested, now everything
> >>> reuses
> >>>>>>>> the same types
> >>>>>>>> * even though it is longer, things can be more clear how they are
> >>>>>> related
> >>>>>>>> * parsing layer can add support for mixed or purely flat
> depending on
> >>>>>>>> user preference (example:
> >>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.'
> notation
> >>>>>>>> to represent nested structures)
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>>
> >>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org
> >>>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org
> >>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org
> >>>>>>>
> >>>>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail:dev-h...@cassandra.apache.org
> >>>
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail:dev-h...@cassandra.apache.org
> >
>

Re: [DISCUSS] Nested YAML configs for new features

Reply via email to