Omg thanks. I hadn't thought about checking *that *documentation. I've been using tesseract.js with node so I completely forgot that it was based on something else. How amateur. I also didn't know that tesseract did its own processing as well. Thanks again I'll try everything there On Tuesday, 20 April 2021 at 5:14:56 pm UTC+10 zdenop wrote:
> Hint: read documentation, stop guessing. You can start here > https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md > > Zdenko > > > ut 20. 4. 2021 o 9:11 Soul Green <[email protected]> napĂsal(a): > >> I am very new to coding so forgive me. >> >> I have been having an extremely low success rate with tesseract. >> Here are 3 examples both pre- and post- processing: >> >> [image: red1.jpg][image: croppedred1.jpg] [image: >> yellow1.jpg][image: >> croppedyellow1.jpg] [image: blue1.jpg][image: >> croppedblue1.jpg] >> These were scanned as "a" ,"Ss30", and "moh" respectively. >> I consider the yellow one a success, as I can just regex the 30 out of >> the result, but I still don't understand how it could be so off for the >> rest. >> >> I've tried different traineddatas, even including one that I trained >> myself on over 200 data examples. >> >> I have three theories as to why I couldn't train it: >> 1. The different colours are processed differently, causing differently >> shaped characters. (Red looks bold and yellow looks thin) >> 2. The different sizes of the images causes the characters to be slightly >> differently shaped when cropped. >> 3. Tesseract assumes that the two lines of text are one, and reads them >> together. >> >> Could someone please give me a hint on what to try? I don't want to spend >> another day training it on just blue ones (for example) only to find that >> colour isn't the problem. >> Thanks >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/9d819bc5-cf07-4c28-91a6-61b142ccc324n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/9d819bc5-cf07-4c28-91a6-61b142ccc324n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7ee0d000-566c-4371-acd2-b4a23b648563n%40googlegroups.com.

