> I definitely consider intercepting various syscalls by means of > LD_PRELOADing more intrusive than setting a single environment > variable that was invented for the purpose of setting timestamps. > Just think of a new shiny syscall that might add a new source of > non-reproducibility.
What 'new shiny syscall' shall influence the creation of PDFs, specified by international standards? I think this is a straw man argument. I dare to say that the ghostscript interface changes in the last few years are by far more numerous (look at the LilyPond commits Masamichi had to implement) than the number of time interface changes (which, AFAIK, are zero since a long time, but I'm not an expert)... > 1) Strip non-determinism from the generated PDF. This is even > mentioned at https://reproducible-builds.org/docs/timestamps/ - > before discussing libfaketime which spends more than half of the > paragraph mentioning possible issues. [...] This is what I've started with, see the attached experimental stuff. However, I stopped working on it since it will always remain a partial solution, because ... > This probably leaves the UUIDs (is that the issue you mention > above?) which can be overridden using -sDocumentUUID and > -sInstanceUUID. ... there is one additional field called `/ID` in (some) PDF output files that is apparently a random-based value. I've contacted some gs people to get more info on that. It also seems that ghostscript's creation and insertion of subsetted fonts is dependent on the system time. To me this looks like a gs bug. During my tests a lot of PDFs – even with the above experimental changes – have exactly this problem (this is, the subsetted fonts were not identical inspite of completely identical source fonts), which means that you can't circumvent it. Using 'libfaketime', this issue magically disappears. > Setting a constant time using libfaketime will result in the same > UUID for all generated PDFs, so it can't get worse; but I think it > would be desirable to do better than that and compute a "unique" ID > based on the input file, maybe as simple as the hash of the file > path. Well, UUIDs as used by ghostscript are based on both the time and hash values, which means that we actually *do* get unique UUIDs, with the restriction that the first 12 digits of the UUID are a fixed value because of the frozen time. In other words, this is not a reason to reject the use of 'libfaketime'. Werner
diff --git a/Documentation/GNUmakefile b/Documentation/GNUmakefile index a8c96dcbdb..412cc866ef 100644 --- a/Documentation/GNUmakefile +++ b/Documentation/GNUmakefile @@ -213,11 +213,13 @@ ifeq ($(USE_EXTRACTPDFMARK),yes) -dAutoRotatePages=/None \ -dPrinted=false \ -sOutputFile=$@ \ + -sDocumentUUID="00000000-0000-0000-0000-000000000000" \ -c "30000000 setvmthreshold" \ -I $(top-build-dir)/out-fonts \ -I $(top-build-dir)/out-fonts/Font \ $(outdir)/$*.pdfmark \ - $(outdir)/$*.tmp.pdf + $(outdir)/$*.tmp.pdf \ + $(top-src-dir)/Documentation/no-pdf-dates.ps rm $(outdir)/$*.tmp.pdf else mv $(outdir)/$*.tmp.pdf $@ @@ -677,8 +679,10 @@ $(outdir)/%.pdf: %.eps -dNOPAUSE \ -dBATCH \ -sOutputFile=$@ \ + -sDocumentUUID="00000000-0000-0000-0000-000000000000" \ -dEPSCrop \ - -f $< + $< \ + $(top-src-dir)/Documentation/no-pdf-dates.ps # ly-examples/ $(outdir)/%.png: %.ly
no-pdf-dates.ps
Description: PostScript document