Thanks for starting this discussion, Piotr! I think moving towards a more modern format for our logging config would be great. Personally, I think YAML would be the nicest to work with as an operator. It should also be very familiar to those who work in Docker and Kubernetes.
A few thoughts 1. This would establish two different config formats in Kafka. Properties for kafka configs and YAML/XML/JSON for log configs. Whatever we choose for the log4j2 config format, we should also consider it as a possible format for Kafka itself (assuming we ever move towards modernizing our own configs). 2. How do we determine which type of config file has been given? Do we try to infer it based on file extension? What is the behavior if both old and new files exist? 3. Since a bit of time has passed since we voted on KIP-653, we may need to amend it to lay out a deprecation path for the log4j 1.x properties format 4. Data bindings and parsers are common sources of CVEs. It looks like Snakeyaml is no exception ( https://www.cvedetails.com/version-list/0/66013/1/), though it doesn't look much worse than Jackson. Just to point out, this will add a bit of dependency overhead as we keep up with security patches. -David A On Tue, Oct 29, 2024 at 8:48 AM Piotr P. Karwasz <pi...@mailing.copernik.eu> wrote: > Hi, > > In the context of the current migration process from Log4j 1.x/Reload4j > to Log4j Core 2.x[1], I believe that the choice of configuration format > used by the Kafka binary distribution, should receive a particular > attention. > > Log4j Core 2.x supports four native configuration formats (XML, JSON, > YAML and Java Properties[2]). The version 1.x XML and Java Properties > configuration file formats are incompatible with the new formats, but > they can be converted at runtime, using the `log4j-1.2-api` artifact[3]. > This is of course a transitional option, since the old formats are not > extensible and do not offer most of the features of Log4j Core 2.x. > > While the 2.x Java Properties configuration format might seem as the > natural migration path for the current Apache Kafka configuration, I > would strongly advise against this choice. The Log4j Core 2.x runtime > has a hierarchical structure, which can be easily reflected by formats > like XML, JSON or YAML, but not so much by Java Properties. For this > reason the `*.properties` configuration format is: > > * very verbose, > > * contains a lot of quirks to make it less verbose[4]. > > If we exclude Java Properties, only three choices remain: > > * The default XML format, which has no dependencies (if we exclude the > JPMS `java.xml` module) and has a schema[5] that can be used to validate > the configurations. This might, however, strongly contrast with the > other Kafka configuration files that are maintained as Java Properties. > > * The JSON format has a dependency on `jackson-databind`, which is > already present in the Kafka binary distribution. It is a matter of > personal taste, but I find it even more verbose than the Java Properties > format (although it does not have quirks). In Log4j Core 3.x the > dependency on `jackson-databind` has been replaced with an in-house parser. > > * My favorite would be the YAML format, that would require the addition > of `jackson-dataformat-yaml` (and its `snakeyaml` transitive dependency) > to the Kafka runtime. The advantage, however, would be that it is > probably the less verbose of the available formats. > > What do you think, which one of the configuration formats available in > Log4j Core 2.x should be used by default by Kafka? > > Piotr > > [1] https://github.com/apache/kafka/pull/17373 > > [2] > > https://logging.apache.org/log4j/2.x/manual/configuration.html#configuration-factories > > [3] > > https://logging.apache.org/log4j/2.x/migrate-from-log4j1.html#ConfigurationCompatibility > > [4] > > https://logging.apache.org/log4j/2.x/manual/configuration.html#java-properties-features > > [5] https://logging.apache.org/xml/ns/ > > -- David Arthur