Hi everyone, I was trying to create the datatraining for the OCR-A font following the guides found online, but I ran into various problems. I will explain the steps I followed: - through JTessBOXEditor I created the files (.box, font_properties, .tif) - I ran the command "tesseract lang_name.ocraextended.exp0.tif lang_name.ocraextended.exp0 --psm 6 nobatch box.train" to create the .tr file, obtaining the following output:
Page 1 APPLY_BOXES: Boxes read from boxfile: 230 Found 230 good blobs. Generated training data for 230 words Page 2 APPLY_BOXES: Boxes read from boxfile: 230 Found 230 good blobs. Generated training data for 230 words Page 3 APPLY_BOXES: Boxes read from boxfile: 130 Found 130 good blobs. Generated training data for 130 words - I executed the command unicharset_extractor lang_name.ocraextended.exp0.box - mftraining -F lang_name.font_properties -U unicharset -O lang_name.unicharset leng_name.ocraextended.exp0.tr. This last command gave me the following output Reading mftraining ... Failed to open tr file: mftraining Reading lang_name .ocraextended.exp0.tr ... Flat shape table summary: Number of shapes = 36 max unichars = 1 number with multiple unichars = 0 N == sizeof(Cluster->Mean):Error:Assert failed:in file ../../../src/classify/cluster.cpp, line 2527 Then if I run the command: "cntraining leng_name.ocraextended.exp0.tr" but I don't get the files created Can someone kindly help me please Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/395d46d1-2c6b-455f-8713-92b8968a20a8n%40googlegroups.com.

