Source: ghostscript, ocrmypdf Control: found -1 ghostscript/9.56.0~dfsg-1 Control: found -1 ocrmypdf/13.4.0+dfsg-1 Severity: serious Tags: sid bookworm User: [email protected] Usertags: breaks needs-update
Dear maintainer(s),With a recent upload of ghostscript the autopkgtest of ocrmypdf fails in testing when that autopkgtest is run with the binary packages of ghostscript from unstable. It passes when run with only packages from testing. In tabular form:
pass fail
ghostscript from testing 9.56.0~dfsg-1
ocrmypdf from testing 13.4.0+dfsg-1
all others from testing from testing
I copied some of the output at the bottom of this report.
Currently this regression is blocking the migration of ghostscript to
testing [1]. Due to the nature of this issue, I filed this bug report
against both packages. Can you please investigate the situation and
reassign the bug to the right package?
More information about this bug and the reason for filing it can be found on https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation Paul [1] https://qa.debian.org/excuses.php?package=ghostscript https://ci.debian.net/data/autopkgtest/testing/amd64/o/ocrmypdf/20818050/log.gz=================================== FAILURES =================================== ________________________________ test_force_ocr ________________________________
resources = PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')
outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_force_ocr0/out.pdf')
def test_force_ocr(resources, outpdf):
out = check_ocrmypdf(
resources / 'graph_ocred.pdf',
outpdf,
'-f',
'--plugin',
'tests/plugins/tesseract_cache.py',
)
pdfinfo = PdfInfo(out)
assert pdfinfo[0].has_text
E assert FalseE + where False = <PageInfo pageno=0 7.573333333333333333333333333"x6.16" rotation=0 dpi=400.000000x400.000000 has_text=False>.has_text
tests/test_main.py:83: AssertionError----------------------------- Captured stderr call -----------------------------
Scanning contents: 0%| | 0/1 [00:00<?, ?page/s] Scanning contents: 100%|██████████| 1/1 [00:00<00:00, 62.30page/s] OCR: 0%| | 0.0/1.0 [00:00<?, ?page/s] OCR: 50%|█████ | 0.5/1.0 [00:02<00:02, 5.47s/page] OCR: 100%|██████████| 1.0/1.0 [00:02<00:00, 2.75s/page] PDF/A conversion: 0%| | 0/1 [00:00<?, ?page/s] Recompressing JPEGs: 0image [00:00, ?image/s][A Recompressing JPEGs: 0image [00:00, ?image/s] Deflating JPEGs: 0%| | 0/1 [00:00<?, ?image/s][A Deflating JPEGs: 100%|██████████| 1/1 [00:00<00:00, 74.34image/s] JBIG2: 0item [00:00, ?item/s][A JBIG2: 0item [00:00, ?item/s]------------------------------ Captured log call ------------------------------- INFO ocrmypdf._pipeline:_pipeline.py:275 page already has text! - rasterizing text and running OCR anyway
INFO ocrmypdf._sync:_sync.py:301 Postprocessing...WARNING ocrmypdf._pipeline:_pipeline.py:776 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata. INFO ocrmypdf.optimize:optimize.py:665 Optimize ratio: 1.52 savings: 34.1%
INFO ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)WARNING ocrmypdf._validation:_validation.py:381 The output file size is 2.45× larger than the input file.
Possible reasons for this include: The argument --force-ocr was issued, causing transcoding.The optional dependency 'jbig2' was not found, so some image optimizations could not be attempted.
PDF/A conversion was enabled. (Try `--output-type pdf`.) Plugins were used.--------------------------- Captured stderr teardown ---------------------------
PDF/A conversion: 100%|██████████| 1/1 [00:01<00:00, 1.20s/page]________________________________ test_skip_ocr _________________________________
resources = PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')
outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_skip_ocr0/out.pdf')
def test_skip_ocr(resources, outpdf):
out = check_ocrmypdf(
resources / 'graph_ocred.pdf',
outpdf,
'-s',
'--plugin',
'tests/plugins/tesseract_cache.py',
)
pdfinfo = PdfInfo(out)
assert pdfinfo[0].has_text
E assert FalseE + where False = <PageInfo pageno=0 7.573333333333333333333333333"x6.16" rotation=0 dpi=150.000000x150.000000 has_text=False>.has_text
tests/test_main.py:95: AssertionError----------------------------- Captured stderr call -----------------------------
Scanning contents: 0%| | 0/1 [00:00<?, ?page/s] Scanning contents: 100%|██████████| 1/1 [00:00<00:00, 70.71page/s] OCR: 0%| | 0.0/1.0 [00:00<?, ?page/s] OCR: 100%|██████████| 1.0/1.0 [00:00<00:00, 47.12page/s] PDF/A conversion: 0%| | 0/1 [00:00<?, ?page/s] Recompressing JPEGs: 0image [00:00, ?image/s][A Recompressing JPEGs: 0image [00:00, ?image/s] Deflating JPEGs: 0%| | 0/1 [00:00<?, ?image/s][A Deflating JPEGs: 100%|██████████| 1/1 [00:00<00:00, 235.24image/s] JBIG2: 0item [00:00, ?item/s][A JBIG2: 0item [00:00, ?item/s]------------------------------ Captured log call ------------------------------- INFO ocrmypdf._pipeline:_pipeline.py:287 skipping all processing on this page
INFO ocrmypdf._sync:_sync.py:301 Postprocessing...WARNING ocrmypdf._pipeline:_pipeline.py:776 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata. INFO ocrmypdf.optimize:optimize.py:665 Optimize ratio: 1.14 savings: 12.6%
INFO ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)--------------------------- Captured stderr teardown ---------------------------
PDF/A conversion: 100%|██████████| 1/1 [00:00<00:00, 4.16page/s]________________________________ test_redo_ocr _________________________________
resources = PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')
outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_redo_ocr0/out.pdf')
def test_redo_ocr(resources, outpdf):
in_ = resources / 'graph_ocred.pdf'
before = PdfInfo(in_, detailed_analysis=True)
out = outpdf
out = check_ocrmypdf(in_, out, '--redo-ocr')
after = PdfInfo(out, detailed_analysis=True)
assert before[0].has_text and after[0].has_text
E assert (True and False)E + where True = <PageInfo pageno=0 7.573333333333333333333333333"x6.16" rotation=0 dpi=150.000000x150.000000 has_text=True>.has_text E + and False = <PageInfo pageno=0 7.573333333333333333333333333"x6.16" rotation=0 dpi=150.000000x150.000000 has_text=False>.has_text
tests/test_main.py:104: AssertionError----------------------------- Captured stderr call -----------------------------
Scanning contents: 0%| | 0/1 [00:00<?, ?page/s] Scanning contents: 100%|██████████| 1/1 [00:00<00:00, 20.63page/s] OCR: 0%| | 0.0/1.0 [00:00<?, ?page/s] OCR: 50%|█████ | 0.5/1.0 [00:04<00:04, 8.64s/page] OCR: 100%|██████████| 1.0/1.0 [00:04<00:00, 4.35s/page] PDF/A conversion: 0%| | 0/1 [00:00<?, ?page/s] Recompressing JPEGs: 0image [00:00, ?image/s][A Recompressing JPEGs: 0image [00:00, ?image/s] Deflating JPEGs: 0%| | 0/1 [00:00<?, ?image/s][A Deflating JPEGs: 100%|██████████| 1/1 [00:00<00:00, 254.88image/s] JBIG2: 0item [00:00, ?item/s][A JBIG2: 0item [00:00, ?item/s]------------------------------ Captured log call -------------------------------
INFO ocrmypdf._pipeline:_pipeline.py:284 redoing OCR INFO ocrmypdf._sync:_sync.py:301 Postprocessing...ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 GPL Ghostscript 9.56.0 (2022-03-29)
Copyright (C) 2022 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. Processing pages 1 through 1. Page 1The following warnings were encountered at least once while processing this file:
number uses illegal exponent form
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 This file
had errors that were repaired or ignored.
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 The file was
produced by: ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277
>>>> GPL Ghostscript 9.15 <<<<
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 Please
notify the author of the software that produced this
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 file that it
does not conform to Adobe's published PDF
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 specification.WARNING ocrmypdf._pipeline:_pipeline.py:776 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata. INFO ocrmypdf.optimize:optimize.py:665 Optimize ratio: 1.14 savings: 12.6%
INFO ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)--------------------------- Captured stderr teardown ---------------------------
PDF/A conversion: 100%|██████████| 1/1 [00:00<00:00, 3.91page/s]=========================== short test summary info ============================
FAILED tests/test_main.py::test_force_ocr - assert False FAILED tests/test_main.py::test_skip_ocr - assert False FAILED tests/test_main.py::test_redo_ocr - assert (True and False)======= 3 failed, 274 passed, 37 skipped, 4 xfailed in 397.41s (0:06:37) =======
autopkgtest [08:17:33]: test test-suite
OpenPGP_signature
Description: OpenPGP digital signature

