Hi, I am working on training an LSTM model for old-style English printing (i.e. a font somewhat like Caslon, long-s and substantial printing defects). I am hoping to eventually submit to tessdata_contrib.
I have had quite some success with a script to generate line data using a modified version of Adobe Caslon Pro and some noise generation and then training on top of the eng model [1]. This is mostly because I do not want to have to process lines out of thousands of images and correct them all first. However, because I am training on artificial data, but the actual aim is to OCR real images, I would like to be able to evaluate the effects of various parameters more objectively. However, I am struggling to figure out how to generate the required data to get an answer from lstmeval. The inputs I have are a directory of images and text files, in the same way that I have a directory of generated images for the ground truth data I am training with. What is the correct way to generate the required data for running lstmeval manually in this case? [1]: https://en.wikisource.org/wiki/User:Inductiveload/Tesseract -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/45458ed5-aaa2-4edd-9399-0473b24c1e3cn%40googlegroups.com.