[jira] [Commented] (TIKA-2581) testOCROutputsHOCR fails with Tesseract 4.0

ASF GitHub Bot (Jira) Mon, 21 Oct 2019 14:41:44 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956468#comment-16956468
 ]


ASF GitHub Bot commented on TIKA-2581:
--------------------------------------

epugh commented on issue #221: Fix for TIKA-2581 contributed by ewanmellor.
URL: https://github.com/apache/tika/pull/221#issuecomment-544719955
 
 
   At this point, does it make sense to support Tesseract3 when running tests?  
  Maybe update the documentation 
https://cwiki.apache.org/confluence/display/TIKA/TikaOCR that the output format 
is slightly different?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> testOCROutputsHOCR fails with Tesseract 4.0
> -------------------------------------------
>
>                 Key: TIKA-2581
>                 URL: https://issues.apache.org/jira/browse/TIKA-2581
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>            Reporter: Ewan Mellor
>            Priority: Minor
>
> TesseractOCRParserTest.testOCROutputsHOCR fails with Tesseract 4.0.
> With 3.x, the output is <span>Happy</span> but with 4.0 the output is 
> <span><strong>Happy</strong></span>.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TIKA-2581) testOCROutputsHOCR fails with Tesseract 4.0

Reply via email to