oops, missed this delivery failure. The ttf file is too large to attach
because it contains asian characters. I can upload it somewhere if you're
interested, but I plan on training a model for my own edification. Original
message below:

This is awesome, thank you so much!

What hyperparameters did you use for training? number of pages? epochs?

Which model did you start with? your file seems smaller than other
eng.traineddata files.

Thanks,
~Marvin

On Sun, Mar 28, 2021 at 10:16 AM Shree Devi Kumar <shreesh...@gmail.com>
wrote:

> Finetuning with font will help.
>
> I retrained using "Oleo Script Swash Caps Bold" font which had
> numerals similar to the test image. And the numbers get recognized now.
>
> (base) ubuntu@tesseract-ocr-1:~/TEST$ tesseract 717-300.png -
> V7
> (base) ubuntu@tesseract-ocr-1:~/TEST$ tesseract 717-300.png -
> --tessdata-dir /home/ubuntu/tesstrain/data/   -l engtuned
> Failed to load any lstm-specific dictionaries for lang engtuned!!
> 717
>
> Finetuned traineddata File is attached.
>
> On Sat, Mar 27, 2021 at 10:14 PM Marvin Thielk <marvin.thi...@gmail.com>
> wrote:
>
>>  I do have the font available as a ttf file. It is probably copyright
>> protected but I could post it if it would be useful.
>> No I need to recognize letters and numbers, and I've been able to extract
>> text from other regions of the images, its just this region of numbers and
>> .%'s
>>
>> Thanks,
>> ~Marvin
>>
>> On Saturday, March 27, 2021 at 9:50:46 AM UTC-4 shree wrote:
>>
>>> Do you have the font used in the sample?
>>> Do you only need to recognise numbers in it?
>>>
>>> On Sat, Mar 27, 2021, 16:10 Marvin Thielk <marvin...@gmail.com> wrote:
>>>
>>>> I've tried a variety of pre-processing attempts and different configs,
>>>> but this feels like it should be an easy detection task.
>>>>
>>>> I've tried with several different psm and oem settings. Even
>>>> restricting to numerical characters. Nothing seems to help.
>>>>
>>>> Is the next step to re-train it?
>>>>
>>>> version info if it helps:
>>>> tesseract v5.0.0-alpha.20201127
>>>>  leptonica-1.78.0
>>>>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 :
>>>> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>>>>  Found AVX2
>>>>  Found AVX
>>>>  Found FMA
>>>>  Found SSE
>>>>  Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6
>>>> liblz4/1.7.5
>>>>  Found libcurl/7.59.0 OpenSSL/1.0.2o (WinSSL) zlib/1.2.11 WinIDN
>>>> libssh2/1.7.0 nghttp2/1.31.0
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/1bb67d51-2bd3-4d4e-9ba1-8b39b7f3ee43n%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1bb67d51-2bd3-4d4e-9ba1-8b39b7f3ee43n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/361e0ed0-c2c6-4a80-8509-31237ae551f4n%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/361e0ed0-c2c6-4a80-8509-31237ae551f4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/j3An1bBB_S0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUJRvd%2Bbf%2B1HgCPNmtFLO%3Dk_8-xZOEVd%2BMEEqzjaF_hkQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUJRvd%2Bbf%2B1HgCPNmtFLO%3Dk_8-xZOEVd%2BMEEqzjaF_hkQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Marvin Thielk
Neuroscience PhD candidate at UCSD
775 964 8726

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHqNQh7Mkm-%2Bo77gr%3DE0kuzKd%2Bys%3Dct7wH0iYGCq6xZ9G7B4Mw%40mail.gmail.com.

Reply via email to