[jira] [Updated] (TIKA-4270) wrong skew angle in tika-parser-ocr-module

Tilman Hausherr (Jira) Thu, 20 Jun 2024 20:52:10 -0700


     [ 
https://issues.apache.org/jira/browse/TIKA-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tilman Hausherr updated TIKA-4270:
----------------------------------
    Description: 
We use tika to extract text from different sources, including images with text 
that is rotated at a certain angle. To extract text from image with ocr, tika 
first deskew image. The skew angle is not calculated correctly. In example 
[^for_issue] (PNG file), the text is rotated at an angle of ~40 degrees. But 
the skew angle function 
(org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) returns an angle 
of about 15. The slope angle calculation flag is enabled.

The documentation 
(https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
 does not have sufficient information for this version of tika, there is a todo 
box and some relevant information for tika 1 (requires python and its 
libraries, but in the version of tika we use, angle calculations are 
implemented only using java)

  was:
We use tika to extract text from different sources, including images with text 
that is rotated at a certain angle. To extract text from image with ocr, tika 
first deskew image. The skew angle is not calculated correctly. In example 
[^for_issue] , the text is rotated at an angle of ~40 degrees. But the skew 
angle function (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) 
returns an angle of about 15. The slope angle calculation flag is enabled.

The documentation 
(https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
 does not have sufficient information for this version of tika, there is a todo 
box and some relevant information for tika 1 (requires python and its 
libraries, but in the version of tika we use, angle calculations are 
implemented only using java)


> wrong skew angle in tika-parser-ocr-module
> ------------------------------------------
>
>                 Key: TIKA-4270
>                 URL: https://issues.apache.org/jira/browse/TIKA-4270
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.9.1
>            Reporter: Roman
>            Priority: Major
>         Attachments: for_issue
>
>
> We use tika to extract text from different sources, including images with 
> text that is rotated at a certain angle. To extract text from image with ocr, 
> tika first deskew image. The skew angle is not calculated correctly. In 
> example [^for_issue] (PNG file), the text is rotated at an angle of ~40 
> degrees. But the skew angle function 
> (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) returns an angle 
> of about 15. The slope angle calculation flag is enabled.
> The documentation 
> (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
>  does not have sufficient information for this version of tika, there is a 
> todo box and some relevant information for tika 1 (requires python and its 
> libraries, but in the version of tika we use, angle calculations are 
> implemented only using java)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (TIKA-4270) wrong skew angle in tika-parser-ocr-module

Reply via email to