I do have the font available as a ttf file. It is probably copyright protected but I could post it if it would be useful. No I need to recognize letters and numbers, and I've been able to extract text from other regions of the images, its just this region of numbers and .%'s
Thanks, ~Marvin On Saturday, March 27, 2021 at 9:50:46 AM UTC-4 shree wrote: > Do you have the font used in the sample? > Do you only need to recognise numbers in it? > > On Sat, Mar 27, 2021, 16:10 Marvin Thielk <marvin...@gmail.com> wrote: > >> I've tried a variety of pre-processing attempts and different configs, >> but this feels like it should be an easy detection task. >> >> I've tried with several different psm and oem settings. Even restricting >> to numerical characters. Nothing seems to help. >> >> Is the next step to re-train it? >> >> version info if it helps: >> tesseract v5.0.0-alpha.20201127 >> leptonica-1.78.0 >> libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : >> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 >> Found AVX2 >> Found AVX >> Found FMA >> Found SSE >> Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 >> liblz4/1.7.5 >> Found libcurl/7.59.0 OpenSSL/1.0.2o (WinSSL) zlib/1.2.11 WinIDN >> libssh2/1.7.0 nghttp2/1.31.0 >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/1bb67d51-2bd3-4d4e-9ba1-8b39b7f3ee43n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/1bb67d51-2bd3-4d4e-9ba1-8b39b7f3ee43n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/361e0ed0-c2c6-4a80-8509-31237ae551f4n%40googlegroups.com.