[ 
https://issues.apache.org/jira/browse/TIKA-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159282#comment-17159282
 ] 

Hudson commented on TIKA-3131:
------------------------------

SUCCESS: Integrated in Jenkins build tika-branch-1x #347 (See 
[https://builds.apache.org/job/tika-branch-1x/347/])
TIKA-3131 -- swap default values of averageCharTolerance and (tallison: 
[https://github.com/apache/tika/commit/6fb39c9583e04edf72bd19f800b591b1f49c6497])
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java


> PDFParserConfig default values were accidentally swapped
> --------------------------------------------------------
>
>                 Key: TIKA-3131
>                 URL: https://issues.apache.org/jira/browse/TIKA-3131
>             Project: Tika
>          Issue Type: Bug
>          Components: config, parser
>    Affects Versions: 1.24.1
>            Reporter: Clark Perkins
>            Priority: Major
>             Fix For: 1.25
>
>
> When default values were added for averageCharTolerance andĀ spacingTolerance 
> as a part of TIKA-3091, their values appear to have been inadvertently 
> swapped.
> From PDFBox:
> {noformat}
>     private float spacingTolerance = .5f;
>     private float averageCharTolerance = .3f;
> {noformat}
> From tika 1.24.1:
> {noformat}
>     //The character width-based tolerance value used to estimate where spaces 
> in text should be added
>     //Default taken from PDFBox.
>     private Float averageCharTolerance = 0.5f;
>     //The space width-based tolerance value used to estimate where spaces in 
> text should be added
>     //Default taken from PDFBox.
>     private Float spacingTolerance = 0.3f;
> {noformat}
> This effective change in defaults has caused PDFParser to start adding more 
> spaces than it did in 1.24 and earlier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to