[ https://issues.apache.org/jira/browse/TIKA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ewan Mellor updated TIKA-2581: ------------------------------ Description: TesseractOCRParserTest.testOCROutputsHOCR fails with Tesseract 4.0. With 3.x, the output is <span>Happy</span> but with 4.0 the output is <span><strong>Happy</strong></span>. was: TesseractOCRParserTest.testOCROutputsHOCR fails with Tesseract 4.0. With 3.x, the output is `<span>Happy</span>` but with 4.0 the output is `<span><strong>Happy</strong></span>`. > testOCROutputsHOCR fails with Tesseract 4.0 > ------------------------------------------- > > Key: TIKA-2581 > URL: https://issues.apache.org/jira/browse/TIKA-2581 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.17 > Reporter: Ewan Mellor > Priority: Minor > > TesseractOCRParserTest.testOCROutputsHOCR fails with Tesseract 4.0. > With 3.x, the output is <span>Happy</span> but with 4.0 the output is > <span><strong>Happy</strong></span>. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)