Hi I am a Apache NIFI developer and we have a user reporting an issue
regarding the use of TIka in our ExtractDocumentText processor. The user is
noticing that a particular symbol is not being parsed correctly but rather
is being translated either into a ? (question mark) or " (double quote).
Please see NIFI-10218 <https://issues.apache.org/jira/browse/NIFI-10218>
for more details.

Please advise if there is anything on our side to do to properly extract
this text or is this a known limitation of parsing PDF documents.

Thank you!

Reply via email to