Problem parsing a particular character from a PDF file

Dan S Fri, 06 Mar 2026 11:27:17 -0800

Hi I am a Apache NIFI developer and we have a user reporting an issue
regarding the use of TIka in our ExtractDocumentText processor. The user is
noticing that a particular symbol is not being parsed correctly but rather
is being translated either into a ? (question mark) or " (double quote).
Please see NIFI-10218 <https://issues.apache.org/jira/browse/NIFI-10218>
for more details.


Please advise if there is anything on our side to do to properly extract
this text or is this a known limitation of parsing PDF documents.

Thank you!

Problem parsing a particular character from a PDF file

Reply via email to