Re: [tesseract-ocr] Re: Reading Inconsistently Spaced Text on a busy image

2021-10-22 Thread Schuyler Reinken
We already use python opencv2 to convert the image to remove color and do binarisation. I also tried to use erosion, but it showed no marked improvement. Now for this particular image it would be easy to remove the left side, but it is merely a sample and the text can occur in any part of the i

[tesseract-ocr] Re: Is tesseract 3, 4 and 5 supported for Apple M1?

2021-10-22 Thread Zoltan Szalontay
I am using 4.1.1 without problems: (base) zoltansz@mac-mini-m1 Downloads % tesseract --version 29ms tesseract 4.1.1 leptonica-1.81.1 libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1

Re: [tesseract-ocr] Re: Reading Inconsistently Spaced Text on a busy image

2021-10-22 Thread Zdenko Podobny
As I wrote - try to search for "text detection" (or document analysis) - you will see it is quite difficult and there is almost no free/opensource solution. Something is implemented in tesseract, but ( from my experience) it fails for complex pages like you provided. That's why the documentation su