* Guillem Jover <guil...@debian.org> [2024-11-22 12:29]:
[...]
>   * There were concerns (from Fay) about whether given same input the
>     output changes per arch or hw setup, we'd need to check this; I'd
>     expect this not to be the case for different arches, but it might
>     be an issue with number of cores for example, but if either is true
>     this would be a serious blocker.
>   * There were concerns (from Fay) about the output stream changing due
>     to a potential implementation switch and that affecting external
>     reproducibility. Personally I think while I can see how this is
>     annoying for the involved parties, it's part of the "you need
>     the same tools to generate the same output" premise that we also
>     assume in Debian. I guess keeping both implementations around
>     indefinitely, I think, would make this less of an issue, with the
>     potential drawbacks mentioned in the previous point.
[...]

I did some more testing with zlib-ng.  With the original zlib, you will always
get an identical output stream given the same input stream and compressor
parameters (compression level being the only one that's commonly varied in ZIP
files).  I expected that zlib-ng would often produce a different output steam
than the original, but what I found was a lot more non-deterministic than just
that.

With zlib-ng, feeding the data into the compressor in e.g. 1024-byte chunks
always gave me a different output stream than using 4096-byte chunks (at
compression level 6).  In fact, every chunk size I tried gave a different
output.  And that's with fixed size chunks, which is not a given if you're
handling e.g. a stream of input.

Even using the same buffer size, I cannot get an identical compressed output
stream with Python and Java any more, presumably because of subtle
implementation differences (in the stdlib code that ends up calling zlib to do
the compression) that do not affect zlib but clearly do affect zlib-ng.

Which makes zlib-ng unsuitable for use cases where you need to be able to create
an identical output stream without knowing exactly how the bytes were fed into
the zlib compressor (or simply have no way to control this).  This fundamentally
breaks my tooling in ways I can't fix by using the same build environment.
Because programs that used to produce identical and deterministic output with
zlib no longer do with zlib-ng, despite using the exact same zlib-ng .so.

- Fay

Reply via email to