[ https://issues.apache.org/jira/browse/TIKA-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159282#comment-17159282 ]
Hudson commented on TIKA-3131: ------------------------------ SUCCESS: Integrated in Jenkins build tika-branch-1x #347 (See [https://builds.apache.org/job/tika-branch-1x/347/]) TIKA-3131 -- swap default values of averageCharTolerance and (tallison: [https://github.com/apache/tika/commit/6fb39c9583e04edf72bd19f800b591b1f49c6497]) * (edit) tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java > PDFParserConfig default values were accidentally swapped > -------------------------------------------------------- > > Key: TIKA-3131 > URL: https://issues.apache.org/jira/browse/TIKA-3131 > Project: Tika > Issue Type: Bug > Components: config, parser > Affects Versions: 1.24.1 > Reporter: Clark Perkins > Priority: Major > Fix For: 1.25 > > > When default values were added for averageCharTolerance andĀ spacingTolerance > as a part of TIKA-3091, their values appear to have been inadvertently > swapped. > From PDFBox: > {noformat} > private float spacingTolerance = .5f; > private float averageCharTolerance = .3f; > {noformat} > From tika 1.24.1: > {noformat} > //The character width-based tolerance value used to estimate where spaces > in text should be added > //Default taken from PDFBox. > private Float averageCharTolerance = 0.5f; > //The space width-based tolerance value used to estimate where spaces in > text should be added > //Default taken from PDFBox. > private Float spacingTolerance = 0.3f; > {noformat} > This effective change in defaults has caused PDFParser to start adding more > spaces than it did in 1.24 and earlier. -- This message was sent by Atlassian Jira (v8.3.4#803005)