Hello- We are using PDFTextStripper, and have found some cases where there are a *lot* of extraneous spaces being added to the output. It almost acts like the stripper is thinking that the space width of the font is super tiny.
I managed to get a document that exhibits the behavior: https://drive.google.com/file/d/1B2Mc4mMdsYfk9jKVqQ9OxEhKLRAxprrU/view?usp=sharing The easiest way to see the behavior is in PDFDebugger, View->Show Stripper Text Positions. Note in the lower left corner of the document, there is text "999". The text above and below that is fine, but the line with 999 has a *ton* of extra space rectangles displated. The extract text function in PDFDebugger doesn't sort, so that one comes out fine, but if you use PDFTextStripper with sorting enabled (), the line renders like this: Withdrawals and distributions . . . $ ( 9 9 9 ) Note the many space characters, and that there are even spaces between each 9. I also observe that the PDF has warning messages about fonts (not sure if this might be involved): [main] WARN org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font ArialMT for HelveticaLTStd-Roman [main] WARN org.apache.fontbox.ttf.CmapSubtable - Format 14 cmap table is not supported and will be ignored It almost acts like the parenthesis on the line are triggering some different detection mode where the font's space width is computing to be much smaller than it should be. Any ideas on what is going on or if it is fixable? Thanks! - K