Hi team, thank you for your great work on PDFBox! I want to report an issue with PDF parsing/rendering.
In production, we have encountered with a PDF file that is not rendered properly with PDFBox. It looks like it's cut in the middle. On the other hand, Acrobat and pdf.js can render it without any problem. I troubleshot the issue. PDFBox reports a warning at a specific offset, which is in the middle of a string parameter to a TJ operator. What's interesting is that, the string contains the byte sequence "\\)\n>" (hex: 5C 29 0A 3E) around the offset. I found that PDFBox has a special handling <https://github.com/apache/pdfbox/blob/2.0.28/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java#L480> for this byte sequence. This seems to explain our issue perfectly. Looking at the comment <https://github.com/apache/pdfbox/blob/2.0.28/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/BaseParser.java#L365> I can understand that it's trying to work around some PDF producer bug. However, now it causes a rendering error for properly generated PDF files. Is there something that we can do to get our PDFs rendered correctly? -- Yuxiao Zeng(ユーシャオ ゼン) *スタッフエンジニアリングマネージャー* *医療情報技師* Flatiron Health株式会社 https://flatiron.co.jp

