We already use python opencv2 to convert the image to remove color and do
binarisation. I also tried to use erosion, but it showed no marked
improvement. Now for this particular image it would be easy to remove the
left side, but it is merely a sample and the text can occur in any part of
the i
I am using 4.1.1 without problems:
(base) zoltansz@mac-mini-m1 Downloads % tesseract --version
29ms
tesseract 4.1.1
leptonica-1.81.1
libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 :
libwebp 1
As I wrote - try to search for "text detection" (or document analysis) - you
will see it is quite difficult and there is almost no free/opensource
solution.
Something is implemented in tesseract, but ( from my experience) it fails
for complex pages like you provided. That's why the documentation su
3 matches
Mail list logo