[ https://issues.apache.org/jira/browse/TIKA-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated TIKA-4270: ---------------------------------- Description: We use tika to extract text from different sources, including images with text that is rotated at a certain angle. To extract text from image with ocr, tika first deskew image. The skew angle is not calculated correctly. In example [^for_issue] (PNG file), the text is rotated at an angle of ~40 degrees. But the skew angle function (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) returns an angle of about 15. The slope angle calculation flag is enabled. The documentation (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation) does not have sufficient information for this version of tika, there is a todo box and some relevant information for tika 1 (requires python and its libraries, but in the version of tika we use, angle calculations are implemented only using java) was: We use tika to extract text from different sources, including images with text that is rotated at a certain angle. To extract text from image with ocr, tika first deskew image. The skew angle is not calculated correctly. In example [^for_issue] , the text is rotated at an angle of ~40 degrees. But the skew angle function (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) returns an angle of about 15. The slope angle calculation flag is enabled. The documentation (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation) does not have sufficient information for this version of tika, there is a todo box and some relevant information for tika 1 (requires python and its libraries, but in the version of tika we use, angle calculations are implemented only using java) > wrong skew angle in tika-parser-ocr-module > ------------------------------------------ > > Key: TIKA-4270 > URL: https://issues.apache.org/jira/browse/TIKA-4270 > Project: Tika > Issue Type: Bug > Affects Versions: 2.9.1 > Reporter: Roman > Priority: Major > Attachments: for_issue > > > We use tika to extract text from different sources, including images with > text that is rotated at a certain angle. To extract text from image with ocr, > tika first deskew image. The skew angle is not calculated correctly. In > example [^for_issue] (PNG file), the text is rotated at an angle of ~40 > degrees. But the skew angle function > (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) returns an angle > of about 15. The slope angle calculation flag is enabled. > The documentation > (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation) > does not have sufficient information for this version of tika, there is a > todo box and some relevant information for tika 1 (requires python and its > libraries, but in the version of tika we use, angle calculations are > implemented only using java) -- This message was sent by Atlassian Jira (v8.20.10#820010)