Please upload the file somewhere. If you've used PDFDebugger before, have a look here:
https://issues.apache.org/jira/browse/PDFBOX-3248
and then look at your content stream whether it is the same problem.

Tilman

Am 31.05.2016 um 15:22 schrieb Augusto Ribeiro Silva:
Hi all,

I am using PDFBox java library to read the content of some PDFs and it seems 
like it inserts some weird (hyphen-like) spacing. I get the same result using 
the PDFBox-App command line util.

The es tab lish ment of an in te grated Part ner Re la tion ship Man age ment 
(PRM) sys tem can po ten tially ad dress sev eral as pets

I tried to extract text from the same PDF using the pdftotext command line 
utility it extracts the text correctly:
The establishment of an integrated Partner Relationship Management (PRM) system 
can potentially address several aspects

Does somebody have any idea why PDFBox behaves in this way and any tips to 
fixing it? I am using TIKA but as I understood TIKA uses PDFBox for PDF 
processing underneath.

Best regards,
Augusto
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to