[tesseract-ocr] Advice on training for Old Amharic texts

2024-01-13 Thread Menelik Berhan
*Background* I'm trying to use tesseract 5.3.3 on scanned old books written in Amharic (which uses Ethiopic script). *Major Shortcomings of amh.traineddata from tesseract* *Difference in type of Ethiopic script:* there are Ethiopic script characters in old Amharic texts that are not used in the

Re: [tesseract-ocr] Advice on training for Old Amharic texts

2024-01-13 Thread Dellu Bw
I spend some time trying to improve the default model of Amharic. I default model has a couple of characters missing. As i have noted in many posts in this forum, training by removing the top layer is the best method to introduce new characters. But i really struggled because the training is deter

Re: [tesseract-ocr] Advice on training for Old Amharic texts

2024-01-13 Thread Menelik Berhan
Thanks for your swift reply. It would be my pleasure to collaborate with you. I've noticed that there is are extensive guides and tutorials regarding training tesseract 4.x, and I wanted to switch to 4.x version. I wanted to ask what would be the trade off if I used tesseract 4.x instead of 5.x

Re: [tesseract-ocr] Re: Unable to get Orientation with node-tesseract, Warning, detects only orientation with -l eng Error, OSD requires a model for the legacy engine

2024-01-13 Thread Zdenko Podobny
You do not need to rename traineddata. You can move them to tessdata subdirectory e.g. tessdata/fast, tessdata/best and then use it at "-l best/eng" or "-l fast/eng" Zdenko so 13. 1. 2024 o 3:38 Oliver Saintilien napĂ­sal(a): > Oh right, for those facing a similar issue, what I did was > 1. re