I just started using Tesseract, and I want to know more about the training process of it.
So basically we download the `.traineddata` file from the main repo and extract it with the command `combine_tessdata` to get the `.lstm` and other files like `unicharset`. What exactly does `.traineddata` and `.lstm` files contain. I couldn't read its content, since most of them are encoded. But I'm guessing that `.traineddata` just contains the data of all other files, combined into a single file. Another question is about the input for the LSTM-based model. Take the example in VGSL Specs - rapid prototyping of mixed conv/LSTM networks for images | tessdoc (tesseract-ocr.github.io) <https://tesseract-ocr.github.io/tessdoc/tess4/VGSLSpecs.html>: ``[1,0,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c105]`` Is the direct input for every Tesseract's LSTM model a line image? Or does the line image still need to be processed to some other form before being fed to this network? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d9318434-5d73-4fc1-bfee-b6033502985en%40googlegroups.com.