I do not think that supporting both options is an issue. The settings virtual table would have to use the flattened version. If we support both formats, the question would be: what should be the one used by default in the configuration file?
Le ven. 26 nov. 2021 à 15:40, bened...@apache.org <bened...@apache.org> a écrit : > This is the approach I favour for config files also. We had a much less > engaged discussion on this topic only a few months ago, so glad to see more > people getting involved now. > > I would however personally prefer to see the configuration file slowly > deprecated (if perhaps never retired), in favour of virtual tables, so that > operators may easily set configurations for the entire cluster. Ideally it > would be possible to specify configuration per cluster, per DC and per > node, with the most specific configuration applying I would like to see a > similar hierarchy for Keyspace, Table and Per-Query options. Ideally only > the barest minimum number of options would be necessary to supply in a > config file, and only on first launch – seed nodes, for instance. > > So whatever design we employ here, we should IMO be aiming for it to be > compatible with a CQL representation also. > > > From: Bowen Song <bo...@bso.ng.INVALID> > Date: Wednesday, 24 November 2021 at 18:15 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] Nested YAML configs for new features > Since you mentioned ElasticSearch, I'm actually pretty happy with their > config file syntax. It allows the user to completely flatten out the > entire config file. To give people who isn't familiar with ElasticSearch > an idea, here is a config file we use: > > cluster.name: foobar > > node.remote_cluster_client: false > node.name: "foo.example.com" > node.master: true > node.data: true > node.ingest: true > node.ml: false > > xpack.ml.enabled: false > xpack.security.enabled: false > xpack.security.audit.enabled: false > xpack.watcher.enabled: false > > action.auto_create_index: "+.,-*" > > network.host: _global_ > > discovery.zen.hosts_provider: file > discovery.zen.minimum_master_nodes: 2 > > http.publish_host: "foo.example.com" > http.publish_port: 443 > http.bind_host: 127.0.0.1 > > transport.publish_host: "bar.example.com" > transport.bind_host: 0.0.0.0 > > indices.fielddata.cache.size: 1GB > indices.breaker.total.use_real_memory: false > > path.logs: /var/log/elasticsearch > path.data: /var/lib/elasticsearch/data > > As you can see we can use the flat (grep-able) syntax for everything. > This is also human readable because we can group options together by > inserting empty lines between them. > > The equivalent of the above in a structured syntax will be: > > cluster: > name: foobar > > node: > remote_cluster_client: false > name: "foo.example.com" > master: true > data: true > ingest: true > ml: false > > xpack: > ml: > enabled: false > security: > enabled: false > audit: > enabled: false > watcher: > enabled: false > > action: > auto_create_index: "+.,-*" > > network: > host: _global_ > > discovery: > zen: > hosts_provider: file > minimum_master_nodes: 2 > > http: > publish_host: "foo.example.com" > publish_port: 443 > bind_host: 127.0.0.1 > > transport: > publish_host: "bar.example.com" > bind_host: 0.0.0.0 > > indices: > fielddata: > cache: > size: 1GB > indices: > breaker: > total: > use_real_memory: false > > path: > logs: /var/log/elasticsearch > data: /var/lib/elasticsearch/data > > This may be easier to read for some people, but it is a total nightmare > for "grep" - so many keys have identical names, such as "enabled". > > Also, for the virtual tables, it would be a lot easier to represent > individual values in a virtual table when the config is flat and keys > are unique. The virtual tables would need to either support the encoding > and decoding of the structured config into a flat structure, or use JSON > encoded string value. The use of JSON would make querying individual > value much harder. > > On 22/11/2021 16:16, Joseph Lynch wrote: > > Isn't one of the primary reasons to have a YAML configuration instead > > of a properties file is to allow typed and structured (implies nested) > > configuration? I think it makes a lot of sense to group related > > configuration options (e.g. a feature) into a typed class when we're > > talking about more than one or two related options. > > > > It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to > > period encoded key->value pairs when required (usually when providing > > a property or override layer), Spring and Elasticsearch yamls both > > come to mind. It seems pretty reasonable to support dot encoding and > > decoding, for example {"a": {"b": 12}} -> '"a.b": 12'. > > > > Regarding quickly telling what configuration a node is running I think > > we should lean on virtual tables for "what is the current > > configuration" now that we have them, as others have said the written > > cassandra.yaml is not necessarily the current configuration ... and > > also grep -C or -A exist for this reason. > > > > -Joey > > > > On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<ble...@apache.org> > wrote: > >> I do not have a strong opinion for one or the other but wanted to raise > the > >> issue I see with the "Settings" virtual table. > >> > >> Currently the "Settings" virtual table converts nested options into flat > >> options using a "_" separator. For those options it allows a user to > query > >> the all set of options through some hack. > >> If we decide to move to more nesting (more than one level), it seems to > me > >> that we need to change the way this table is behaving and how we can > query > >> its data. > >> > >> We would need to start using "." as a nesting separator to ensure that > >> things are consistent between the configuration and the table and add > >> support for LIKE restrictions for filtering queries to allow operators > to > >> be able to select the precise set of settings that the operator is > looking > >> for. > >> > >> Doing so is not really complicated in itself but might impact some > users. > >> > >> Le ven. 19 nov. 2021 à 22:39, David Capwell<dcapw...@apple.com.invalid> > a > >> écrit : > >> > >>>> it is really handy to grep > >>>> cassandra.yaml on some config key and you know the value instantly. > >>> You can still do that > >>> > >>> $ grep -A2 coordinator_read_size conf/cassandra.yaml > >>> # coordinator_read_size: > >>> # warn_threshold_kb: 0 > >>> # abort_threshold_kb: 0 > >>> > >>> I was also arguing we should support nested and flat, so if your infra > >>> works better with flat then you could use > >>> > >>> track_warnings.coordinator_read_size.warn_threshold_kb: 0 > >>> track_warnings.coordinator_read_size.abort_threshold_kb: 0 > >>> > >>>> On Nov 19, 2021, at 1:34 PM, David Capwell<dcapw...@apple.com> > wrote: > >>>> > >>>>> With the flat structure it turns into properties file - would it be > >>>>> possible to support both formats - nested yaml and flat properties? > >>>> > >>>> For majority of our configs yes, but there are a subset where flat > >>> properties is annoying > >>>> hinted_handoff_disabled_datacenters - set type, so you could do > >>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal > >>> with separators as the format doesn’t support > >>>> seed_provider.parameters - this is a map type… so would need to do > >>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we > special > >>> case maps as dynamic fields? Then seed_provider.parameters.a=a? We > have > >>> ParameterizedClass all over the code > >>>> So, as long as we define how to deal with java collections; we could > in > >>> theory support properties files (not arguing for that in this thread) > as > >>> well as system properties. > >>>> > >>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski < > >>> lewandowski.ja...@gmail.com> wrote: > >>>>> With the flat structure it turns into properties file - would it be > >>>>> possible to support both formats - nested yaml and flat properties? > >>>>> > >>>>> > >>>>> - - -- --- ----- -------- ------------- > >>>>> Jacek Lewandowski > >>>>> > >>>>> > >>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe < > >>> calebrackli...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> If it's nested, "track_warnings" would still work if you're grepping > >>> around > >>>>>> vim or less. > >>>>>> > >>>>>> I'd have to concede the point about grep output, although there are > >>> tools > >>>>>> likehttps://github.com/kislyuk/yq that could probably be bent to > do > >>> what > >>>>>> you want. > >>>>>> > >>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic < > >>>>>> stefan.mikloso...@instaclustr.com> wrote: > >>>>>> > >>>>>>> Hi David, > >>>>>>> > >>>>>>> while I do not oppose nested structure, it is really handy to grep > >>>>>>> cassandra.yaml on some config key and you know the value instantly. > >>>>>>> This is not possible when it is nested (easily & fastly) as it is > on > >>>>>>> two lines. Or maybe my grepping is just not advanced enough to > cover > >>>>>>> this case? If it is flat, I can just grep "track_warnings" and I > have > >>>>>>> them all. > >>>>>>> > >>>>>>> Can you elaborate on your last bullet point? Parsing layer ... > What do > >>>>>>> you mean specifically? > >>>>>>> > >>>>>>> Thanks > >>>>>>> > >>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<dcapw...@gmail.com> > >>> wrote: > >>>>>>>> This has been brought up in a few tickets, so pushing to the dev > >>> list. > >>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters > >>>>>>>> CASSANDRA-16896 - hard/soft limits for queries > >>>>>>>> CASSANDRA-17147 - Guardrails prototype > >>>>>>>> > >>>>>>>> In short, do we as a project wish to move "new features" into > nested > >>>>>>>> YAML when the feature has "enough" to justify the nesting? I > would > >>>>>>>> really like to focus this discussion on new features rather than > >>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as > there is > >>>>>>>> already a place to talk about that. > >>>>>>>> > >>>>>>>> To get things started, let's start with the track-warning feature > >>>>>>>> (hard/soft limits for queries), currently the configs look as > follows > >>>>>>>> (assuming 15234) > >>>>>>>> > >>>>>>>> track_warnings: > >>>>>>>> enabled: true > >>>>>>>> coordinator_read_size: > >>>>>>>> warn_threshold: 10kb > >>>>>>>> abort_threshold: 1mb > >>>>>>>> local_read_size: > >>>>>>>> warn_threshold: 10kb > >>>>>>>> abort_threshold: 1mb > >>>>>>>> row_index_size: > >>>>>>>> warn_threshold: 100mb > >>>>>>>> abort_threshold: 1gb > >>>>>>>> > >>>>>>>> or should this be "flat" > >>>>>>>> > >>>>>>>> track_warnings_enabled: true > >>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb > >>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb > >>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb > >>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb > >>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb > >>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb > >>>>>>>> > >>>>>>>> For me I prefer nested for a few reasons > >>>>>>>> * easier to enforce consistency as the configs can use shared > types; > >>>>>>>> in the track warnings patch I had mismatches cross configs (warn > vs > >>>>>>>> warns, fail vs abort, etc.) before going nested, now everything > >>> reuses > >>>>>>>> the same types > >>>>>>>> * even though it is longer, things can be more clear how they are > >>>>>> related > >>>>>>>> * parsing layer can add support for mixed or purely flat > depending on > >>>>>>>> user preference (example: > >>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.' > notation > >>>>>>>> to represent nested structures) > >>>>>>>> > >>>>>>>> Thoughts? > >>>>>>>> > >>>>>>>> > --------------------------------------------------------------------- > >>>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > >>>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org > >>>>>>>> > >>>>>>> > --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > >>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org > >>>>>>> > >>>>>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > >>> For additional commands, e-mail:dev-h...@cassandra.apache.org > >>> > >>> > > --------------------------------------------------------------------- > > To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail:dev-h...@cassandra.apache.org > > >