I am using tesseract (Pytesseract to be exact)on a video and I am trying to get better detections buy trying to train the tesseract on a new font so I found a tutorial at:
https://www.youtube.com/watch?v=1v8BPw0Dn0I And This tutorial give very simple and straight forward direction on how to do this (posted below): *Step 1: *Make box files for images that we want to train Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox Eg:tesseract train.my.exp0.tif train.my.exp0 batch.nochop makebox {*Note: After making box files we have to change or modify wrongly identified characters in box files.} *Step 2:* Create .tr file (Compounding image file and box file) Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train Eg: tesseract train.my.exp.tif train.my.exp0 box.train *Step 3:* Extract the charset from the box files (Output for this command is unicharset file) Syntax: unicharset_extractor [langname].[fontname].[expN].box Eg: unicharset_extractor train.my.exp0.box *Step 4:* Create a font_properties file based on our needs. Syntax: echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" [angle bracket should be here] font_properties Eg: echo "arial 0 0 1 0 0" [angled bracket] font_properties *Step 5:* Training the data. Syntax: mftraining -F font_properties -U unicharset -O [langname].unicharset [langname].[fontname].[expN].tr Eg: mftraining -F font_properties -U unicharset -O train.unicharset train.my.exp0.tr *Step 6:* Syntax: cntraining [langname].[fontname].[expN].tr Eg: cntraining train.my.exp0.tr {*Note:After step 5 and step 6 four files were created.(shapetable,inttemp,pffmtable,normproto) } *Step 7:* Rename four files (shapetable,inttemp,pffmtable,normproto) into ([langname].shapetable,[langname].inttemp,[langname].pffmtable,[langname].normproto) Syntax: rename filename1 filename2 Eg: rename shapetable train.shapetable rename inttemp train.inttemp rename pffmtable train.pffmtable rename normproto train.normproto *Step 8: *Create .traineddata file Syntax: combine_tessdata [langname]. Eg: combine_tessdata train. Move .traineddata file to tesseract programs tessdata directory C:\Program Files\Tesseract-OCR\tessdata Run tesseract for trained fronts tesseract Test2.png stdout -l train I get a segmentation fault on Step 5. Has anyone ever had this issue before? Thank you in advance. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2fee7cdd-16cd-4997-8668-778e4d626c1an%40googlegroups.com.

