[ https://issues.apache.org/jira/browse/TIKA-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155789#comment-17155789 ]
Clark Perkins commented on TIKA-3131: ------------------------------------- I'm pretty sure this was just an oversight when copying defaults from PDFBox, so I went ahead and opened a PR to fix them. > PDFParserConfig default values were accidentally swapped > -------------------------------------------------------- > > Key: TIKA-3131 > URL: https://issues.apache.org/jira/browse/TIKA-3131 > Project: Tika > Issue Type: Bug > Affects Versions: 1.24.1 > Reporter: Clark Perkins > Priority: Major > > When default values were added for averageCharTolerance andĀ spacingTolerance > as a part of TIKA-3091, their values appear to have been inadvertently > swapped. > From PDFBox: > {noformat} > private float spacingTolerance = .5f; > private float averageCharTolerance = .3f; > {noformat} > From tika 1.24.1: > {noformat} > //The character width-based tolerance value used to estimate where spaces > in text should be added > //Default taken from PDFBox. > private Float averageCharTolerance = 0.5f; > //The space width-based tolerance value used to estimate where spaces in > text should be added > //Default taken from PDFBox. > private Float spacingTolerance = 0.3f; > {noformat} > This effective change in defaults has caused PDFParser to start adding more > spaces than it did in 1.24 and earlier. -- This message was sent by Atlassian Jira (v8.3.4#803005)