Re: [DISCUSS] Micro-modules in Commons, lang/compress examples

Piotr P. Karwasz Thu, 30 Oct 2025 03:45:36 -0700

Hi Vladimir,

On 30.10.2025 06:49, Vladimir Sitnikov wrote:
> Motivation (real-world pain):
> 
> As Sebb noted, unused classes shouldn’t affect runtime, however 
> vulnerability scanners flag artifacts, not “used classes”. In
> practice teams must upgrade/patch even when only a tiny part is 
> affected; proving non-impact is often harder than bumping or
> excluding.



I don’t think we should adapt our architecture to the limitations of
“security scanners”. These tools should evolve to understand our
architecture and provide meaningful, context-aware warnings.

For CVE-2025-48924 (commons-lang3), Gary and I prepared several VEX
files documenting the impact on other Commons components ([1]-[4]),
which helped assess the situation for Apache Solr ([5], 400+ deps).
This was mostly manual work assisted by IDE reachability analysis, but
the VEX Tooling initiative ([6]) I’m involved in is close to
automating this and generating rich VEX statements (“VEXplanations”).

SecObserve ([7]) is also integrating VEX data from dependencies.

> Concrete proposal (small, testable):
> 
> Pilot a commons-stringutils4 artifact containing only StringUtils 
> and Strings (and minimal shared internals if any). Use 
> org.apache.commons.stringutils4 package so it could co-exist with 
> the current commons-lang3.
> 
> The existing commons-lang3 could depend on commons-stringutils4 so 
> lang3.StringUtils could delegate all the methods to 
> stringutils4.StringUtils.


I think extracting `commons-stringutils4` would be overkill. While
modularization can improve isolation, it also adds maintenance and
dependency complexity. If we go in that direction, it should follow
clear criteria, for example:

 - Packages with specific external dependencies,
 - Packages requiring an additional JPMS module,
 - Packages needing privileged capabilities (processes, file/network
   access, reflection). There’s a Java project inspired by Google
   Capslock ([8]) defining such capabilities. I can share refs once
   available.

Some parts of `commons-lang3` (e.g., reflection utilities) could be
good candidates for extraction, but I don’t think `StringUtils` fits
that case.

Instead, we could make `StringUtils` easier to *shade* by reducing
unnecessary inter-class references. Currently, shading only
`StringUtils` (with Maven Shade Plugin + <minimizeJar>) still pulls in
about 37% of Commons Lang, which seems excessive. Improving this would
help projects that shade parts of Lang instead of copying them.

This wouldn’t fully solve the vulnerability management issue you
mention, but it would encourage better security practices overall.
Shading often has a bad reputation because it’s misused: projects shade
entire dependency stacks instead of using dedicated classloaders (e.g.
Spring Boot Loader). However, *selective* shading of individual classes
or small subsets is quite different:

 - It limits exposure by including only the relevant parts of the
   library.
 - It is safer than copying, since security fixes remain available and
   often reach users before disclosures.
 - It preserves the library’s pedigree, allowing (some shading-aware)
   scanners to still link findings to the original component.

Of course, there are trade-offs, the main one being reduced code reuse
between libraries that share transitive dependencies.

As a side note, GraalVM performs a similar kind of “shading” by
including only reachable classes in its native image. I’m not sure
whether this happens at method or class granularity, but it reflects the
same principle: minimizing the attack surface by minimizing what’s
bundled. I used its capabilities to provide examples of native
applications that only contain parts of Log4j Core [9].

Piotr

References:
[1] https://github.com/apache/commons-bcel/tree/master/src/conf/security
[2] https://github.com/apache/commons-compress/tree/master/src/conf/security
[3]
https://github.com/apache/commons-configuration/tree/master/src/conf/security
[4] https://github.com/apache/commons-text/tree/master/src/conf/security
[5] https://github.com/apache/solr-site/pull/152
[6] https://github.com/vex-generation-toolset
[7] https://github.com/MaibornWolff/SecObserve/issues/3177
[8] https://github.com/google/capslock
[9]
https://github.com/apache/logging-log4j-samples/tree/main/log4j-samples-graalvm

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Micro-modules in Commons, lang/compress examples

Reply via email to