Hi all, I think it's a great idea to use a different configuration method as default. Personally I'd prefer YAML as well because it's the most concise and easy to read format.
1. This would establish two different config formats in Kafka. Properties for kafka configs and YAML/XML/JSON for log configs. Whatever we choose for the log4j2 config format, we should also consider it as a possible format for Kafka itself (assuming we ever move towards modernizing our own configs). I could see a transition for Kafka configs too to YAML. It's widely used in other projects as well and for Kafka too it would make sense to group them for instance by log, storage, networking, security etc.. It definitely has an advantage that we could move to a more well defined structure and away from these long prefixes which are very cumbersome in some places. Perhaps 4.0 would have been a good candidate but given that we should provide backward compatibility anyway, later is good too. 4. Data bindings and parsers are common sources of CVEs. It looks like Snakeyaml is no exception ( https://www.cvedetails.com/version-list/0/66013/1/), though it doesn't look much worse than Jackson. Just to point out, this will add a bit of dependency overhead as we keep up with security patches. It's a good point about CVEs. I haven't seen it on the dev list but were there any conversations about enabling dependabot version updates? With automatic dependabot PRs we could get fixes in as soon as it opens the PR. Best, Viktor On Tue, Oct 29, 2024 at 8:05 PM Piotr P. Karwasz <pi...@mailing.copernik.eu> wrote: > Hi David, > > On 29.10.2024 16:30, David Arthur wrote: > > 2. How do we determine which type of config file has been given? Do we > try > > to infer it based on file extension? What is the behavior if both old and > > new files exist? > > I believe that the current PR[1] shows the correct behavior. Upgraded > systems will have a residual `config/log4j.properties` file. If such a > file exists, it should be used, but users should be warned about its > deprecation. If the `config/log4j.properties` file does not exist and a > `config/log4j2.yaml` file exist, the new file should be used. > > [1] https://github.com/apache/kafka/pull/17373 > > > 4. Data bindings and parsers are common sources of CVEs. It looks like > > Snakeyaml is no exception ( > > https://www.cvedetails.com/version-list/0/66013/1/), though it doesn't > look > > much worse than Jackson. Just to point out, this will add a bit of > > dependency overhead as we keep up with security patches. > > That is an excellent question and a perfect motivation to introduce > Vulnerability Exploitability eXchange between at least the ASF projects. > I would love to do that, but right now there are no tools to automate > that and the cost for maintainers would be astronomical. Hopefully in > some mid term future you could expect Logging Services to publish a VEX > file. We already do it with VDR (which contains only our own > vulnerabilities[2]), but that one can be maintained by hand. > > SnakeYaml would be a deeply nested transitive dependency, but I believe > we can trust the Jackson team to do the right thing: > > * for two CVEs ([3] and [4]) the Jackson team decided to release a new > version of `jackson-dataformat-yaml`. > > * for one CVE that only concerned data binding, the Jackson team didn't > have to do anything[5]. > > On the Logging Services side, we are not so thorough with > vulnerabilities passed on by Jackson: we just bump the version of > Jackson. It is probably worth noting that: > > * The binding of the configuration file to Log4j components is always > done in-house. From Jackson and other dependencies, we only use the tree > parser. > > * A side-effect of having a custom plugin system[6] is that only classes > annotated with `@Plugin` can be instantiated. > > * The XML configuration format is based on a JAXP DOM parser > implementation. > > * The Java Properties format is probably the most subject to security > issues, since it uses an in-house tree parser[7] (obviously based on the > Properties class). We will replace that bunch of code with > `jackson-dataformat-properties` in 3.x[8], which should improve security > by leaving the construction of the tree to parsing experts. The format > will change slightly, which is also a reason I don't recommend Java > Properties. > > Piotr > > [2] https://logging.apache.org/cyclonedx/vdr.xml > > [3] https://github.com/FasterXML/jackson-dataformats-text/pull/328 > > [4] https://github.com/FasterXML/jackson-dataformats-text/issues/342 > > [5] https://github.com/FasterXML/jackson-dataformats-text/issues/382 > > [6] https://logging.apache.org/log4j/2.x/manual/plugins.html > > [7] > > https://github.com/apache/logging-log4j2/blob/2.x/log4j-core/src/main/java/org/apache/logging/log4j/core/config/properties/PropertiesConfigurationBuilder.java > > [8] > > https://logging.staged.apache.org/log4j/3.x/migrate-from-log4j2.html#properties-configuration-file > > >