Omg thanks.
I hadn't thought about checking *that *documentation. I've been using 
tesseract.js with node so I completely forgot that it was based on 
something else. How amateur.
I also didn't know that tesseract did its own processing as well.
Thanks again I'll try everything there
On Tuesday, 20 April 2021 at 5:14:56 pm UTC+10 zdenop wrote:

> Hint: read documentation, stop guessing. You can start here 
> https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md
>
> Zdenko
>
>
> ut 20. 4. 2021 o 9:11 Soul Green <[email protected]> napĂ­sal(a):
>
>> I am very new to coding so forgive me.
>>
>> I have been having an extremely low success rate with tesseract.
>> Here are 3 examples both pre- and post- processing:
>>
>> [image: red1.jpg][image: croppedred1.jpg]            [image: 
>> yellow1.jpg][image: 
>> croppedyellow1.jpg]              [image: blue1.jpg][image: 
>> croppedblue1.jpg]
>> These were scanned as "a" ,"Ss30", and "moh" respectively.
>> I consider the yellow one a success, as I can just regex the 30 out of 
>> the result, but I still don't understand how it could be so off for the 
>> rest.
>>
>> I've tried different traineddatas, even including one that I trained 
>> myself on over 200 data examples.
>>
>> I have three theories as to why I couldn't train it:
>> 1. The different colours are processed differently, causing differently 
>> shaped characters. (Red looks bold and yellow looks thin)
>> 2. The different sizes of the images causes the characters to be slightly 
>> differently shaped when cropped.
>> 3. Tesseract assumes that the two lines of text are one, and reads them 
>> together.
>>  
>> Could someone please give me a hint on what to try? I don't want to spend 
>> another day training it on just blue ones (for example) only to find that 
>> colour isn't the problem.
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/9d819bc5-cf07-4c28-91a6-61b142ccc324n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/9d819bc5-cf07-4c28-91a6-61b142ccc324n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7ee0d000-566c-4371-acd2-b4a23b648563n%40googlegroups.com.

Reply via email to