Regrettably the only way I know with current tesseract is to work around
the issue, i.e. create a column mask and apply that in a preprocess, hence
feeding tesseract several images for a single page, one for each column
where the other columns are tipexed (white-out, replaced by background
color re
I have data that comes in from various old (1920) magazines that has
multiple blocks of text on a single page. Right now, OCR recognition
interprets the text lines across the page so the output is interspersed
rather than word-wrapped to the next column. Is there any way to get the
OCR scanned
2 matches
Mail list logo