- Fine tune. Starting with an existing trained language, train on your specific additional data. This may work for problems that are close to the existing training data, but different in some subtle way, like a particularly unusual font. May work with even a small amount of training data. - Cut off the top layer (or some arbitrary number of layers) from the network and retrain a new top layer using the new data. If fine tuning doesn’t work, this is most likely the next best option. Cutting off the top layer could still work for training a completely new language or script, if you start with the most similar looking script. - Retrain from scratch. This is a daunting task, unless you have a very representative and sufficiently large training set for your problem. If not, you are likely to end up with an over-fitted network that does really well on the training data, but not on the actual data.
https://tesseract-ocr.github.io/tessdoc/tess5/TrainingTesseract-5.html On Friday, October 20, 2023 at 1:44:40 PM UTC+3 renec...@gmail.com wrote: > I have no idea what do you mean with 'cut off the top layer' ? > Can I find a documentation about this process somewhere ? > I am a tesseract user not (yet) a tesseract specialist. > > Le dim. 15 oct. 2023 à 08:39, Des Bw <desal...@gmail.com> a écrit : > >> Check the conversation in this forum where Schree trained the Norwegian >> data to include the missing letter Æ. I used this method to train for >> Amharic; and worked for me. >> Basically, the method is to cut off the top layer of the network and >> train from there. >> Fine tuning doesn't work for adding missing letters. >> >> On Sunday, October 8, 2023 at 9:38:57 PM UTC+3 renec...@gmail.com wrote: >> >>> I experienced that the official hye.traineddata does not include the և >>> letter. >>> Does someone experience the same problem if yes, what is the turnaround ? >>> >>> Thanks for an answer >>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/8b4a3db2-ef4b-4323-95a7-c62feb92937an%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/8b4a3db2-ef4b-4323-95a7-c62feb92937an%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/16508365-c2e2-4036-97bf-f2bf1188c34bn%40googlegroups.com.