Hi Emmanuel,

On 4.11.2024 09:27, Emmanuel Bourg wrote:
Le 02/11/2024 à 13:19, Gilles Sadowski a écrit :

My question was and still is:  Can modularization help?

Modularization is very much needed for Commons Compress, I think we should at least split the compression part from the archive part, and then further split by file format (commons-compression-* and commons-archive-*).

This would greatly improve the situation with the optional dependencies. Let's say I want to extract xz files, I would simply declare a dependency on commons-compression-xz in my pom.xml, which would pull the org.tukaani:xz dependency transitively, instead of having to declare two dependencies on commons-compress and org.tukaani:xz.

Totally agree. Modularization can also help keeping track of which library actually uses a dependency. Having a direct runtime dependency on `commons-compression-xz` is much more expressive that having a direct runtime dependency on `org.tukaani:xz`: you directly know which project is responsible of adding `xz` to your application dependency stack.

Unfortunately modularization will hardly help with the dependencies on commons-codec, commons-lang and commons-io, because as of commons-compress 1.26 they are used almost everywhere in the code. Only commons-codec is restricted to a couple of formats (LZ4 and Snappy), but commons-io is now part of the core *public* API [1], which is really bad because we can no longer remove the dependency without breaking the binary compatibility (and I think we should do it as soon as possible before this new public method gets widely used).

As it was already suggested in a previous discussion[1], modularization of Commons Compress requires a new major version anyway: to make a _slim_ version of `commons-compress`, we need to change the package name and artifact name.

If we decide to modularize, I can make some initial work in a branch. Since Commons Compress is a well designed library, it probably only needs:

1. To drop big dependencies that are not used a lot. We can shade a couple of `commons-lang` and `commons-io` classes to profit from code reuse.

2. To drop some old compression formats: Pack200 constitutes some 50% of the library and is barely used (or usable).

3. To extend the `CompressorStreamFactory` interface to have a generic mechanism to pass compression options (compression level, etc.) to the `CompressorStreamFactory` implementations.

4. To be split into modules.

Piotr

[1] https://lists.apache.org/thread/pdpplp6lvov6prd6tmhmrd13nzcz5zdq


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to