[ https://issues.apache.org/jira/browse/PDFBOX-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720223#comment-17720223 ]
Andreas Lehmkühler commented on PDFBOX-5595: -------------------------------------------- [~tilman] thanks for double checking > Slight regression on corrupt bug tracker file > --------------------------------------------- > > Key: PDFBOX-5595 > URL: https://issues.apache.org/jira/browse/PDFBOX-5595 > Project: PDFBox > Issue Type: Task > Components: Parsing > Affects Versions: 2.0.28, 3.0.0 PDFBox > Reporter: Tim Allison > Assignee: Andreas Lehmkühler > Priority: Trivial > Fix For: 2.0.29, 3.0.0 PDFBox > > > I'm not sure this is a regression, and apologies if you already dealt with > this before the release of 2.0.28. Also, as a warning, this file is corrupt. > > We used to get more text out of this file in 2.0.27 than we do now in 2.0.28: > [https://corpora.tika.apache.org/base/docs/bug_trackers/evince/evince-395-0.zip-0.pdf] > > This file derived from the evince bug tracker, which now eventually links to > this issue: > [https://gitlab.freedesktop.org/poppler/poppler/-/issues/323] > > This image from the poppler issue shows what we get with PDFBox 2.0.28 on the > left, and 2.0.27 on the right. > > If the decision is "the file is corrupt -> not going to fix", I completely > understand. > !https://gitlab.gnome.org/GNOME/evince/uploads/0bc2302dbafc0bbc2110f0d42951428e/evince.JPG! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org