Am Sonntag, dem 27.12.2020 um 22:24 +0100 schrieb Werner LEMBERG: > > Intercepting syscalls (or whatever the library does, I didn't > > check) doesn't sound like the right approach outside of testing > > reproducibility. > > Why? It's even less intrusive than the `SOURCE_DATE_EPOCH` solution.
I definitely consider intercepting various syscalls by means of LD_PRELOADing more intrusive than setting a single environment variable that was invented for the purpose of setting timestamps. Just think of a new shiny syscall that might add a new source of non-reproducibility. > > The larger "issue" with this topic seems to be LilyPond's > > dependencies, in particular Ghostscript. A contribution to add > > support for above variable was closed as WONTFIX: > > https://bugs.ghostscript.com/show_bug.cgi?id=696765 > > > Exactly. In particular it means that we had to use the patched > Debian version of ghostscript for reproducibility if we go the > `SOURCE_DATE_EPOCH` route – and check which other distributions > provide something similar. I consider this as a very hacky > solution. On the other hand, intercepting the time syscalls is a > completely transparent and clean solution. > > BTW, the next version 'libfaketime' will allow to intercept > `getrandom`, which means that we probably can 'fix' the `/ID` issue > in PDF files generated by gs, too. > > > I think that's a pity, but nothing we can change as a > > "consumer" of library functions. > > Exactly. As long as we don't change LilyPond to produce PDFs by > itself – which is a huge undertaking that I certainly won't start – > I think we have no other choice than using something like > 'libfaketime' or a patched gs version. I definitely prefer the > former. What I wanted to say is that we cannot change the developers' minds to support the environment variable. But we can (and IMHO should) use all available interfaces if we care about reproducibility. I see at least two more options: 1) Strip non-determinism from the generated PDF. This is even mentioned at https://reproducible-builds.org/docs/timestamps/ - before discussing libfaketime which spends more than half of the paragraph mentioning possible issues. 2) As we control the input PS code, we don't have to worry about the operators that get the current time, draw a random number, etc. (as long as we don't use them ourselves). Instead the bug linked above says we just need to tell GS which CreationDate and ModDate to use (via PDFmarks) and this should be straight-forward to fill with values depending on SOURCE_DATE_EPOCH. This probably leaves the UUIDs (is that the issue you mention above?) which can be overridden using -sDocumentUUID and -sInstanceUUID. Setting a constant time using libfaketime will result in the same UUID for all generated PDFs, so it can't get worse; but I think it would be desirable to do better than that and compute a "unique" ID based on the input file, maybe as simple as the hash of the file path. It must be considered that different values will prevent reuse of the GS API instance, but I'd argue that a constant value should be fine in this case. Jonas
signature.asc
Description: This is a digitally signed message part