Re: [tesseract-ocr] Text-wrap recognition

2024-08-19 Thread Ger Hobbelt
Regrettably the only way I know with current tesseract is to work around the issue, i.e. create a column mask and apply that in a preprocess, hence feeding tesseract several images for a single page, one for each column where the other columns are tipexed (white-out, replaced by background color re

[tesseract-ocr] Text-wrap recognition

2024-08-18 Thread Ajg
I have data that comes in from various old (1920) magazines that has multiple blocks of text on a single page. Right now, OCR recognition interprets the text lines across the page so the output is interspersed rather than word-wrapped to the next column. Is there any way to get the OCR scanned