OK! Thank You! On Sun, Aug 14, 2022 at 9:50 AM Zdenko Podobny <zde...@gmail.com> wrote:
> try to follow official training docs: > https://tesseract-ocr.github.io/tessdoc/#training-for-tesseract-5 > > Zdenko > > > ne 14. 8. 2022 o 7:59 Benjamin Hall <codenamejupit...@gmail.com> > napĂsal(a): > >> I am using tesseract (Pytesseract to be exact)on a video and I am trying >> to get better detections buy trying to train the tesseract on a new font so >> I found a tutorial at: >> >> https://www.youtube.com/watch?v=1v8BPw0Dn0I >> >> >> And This tutorial give very simple and straight forward direction on how >> to do this (posted below): >> >> >> *Step 1: *Make box files for images that we want to train >> >> Syntax: tesseract [langname].[fontname].[expN].[file-extension] >> [langname].[fontname].[expN] batch.nochop makebox >> >> Eg:tesseract train.my.exp0.tif train.my.exp0 batch.nochop makebox >> >> >> {*Note: After making box files we have to change or modify wrongly >> identified characters in box files.} >> >> >> *Step 2:* Create .tr file (Compounding image file and box file) >> >> Syntax: tesseract [langname].[fontname].[expN].[file-extension] >> [langname].[fontname].[expN] box.train >> >> Eg: tesseract train.my.exp.tif train.my.exp0 box.train >> >> >> *Step 3:* Extract the charset from the box files (Output for this >> command is unicharset file) >> >> Syntax: unicharset_extractor [langname].[fontname].[expN].box >> >> Eg: unicharset_extractor train.my.exp0.box >> >> >> *Step 4:* Create a font_properties file based on our needs. >> >> Syntax: echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 >> or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" [angle bracket should be here] >> font_properties >> >> Eg: echo "arial 0 0 1 0 0" [angled bracket] font_properties >> >> >> *Step 5:* Training the data. >> >> Syntax: mftraining -F font_properties -U unicharset -O >> [langname].unicharset [langname].[fontname].[expN].tr >> >> Eg: mftraining -F font_properties -U unicharset -O train.unicharset >> train.my.exp0.tr >> >> >> *Step 6:* >> >> Syntax: cntraining [langname].[fontname].[expN].tr >> >> Eg: cntraining train.my.exp0.tr >> >> {*Note:After step 5 and step 6 four files were >> created.(shapetable,inttemp,pffmtable,normproto) } >> >> >> *Step 7:* Rename four files (shapetable,inttemp,pffmtable,normproto) >> into >> ([langname].shapetable,[langname].inttemp,[langname].pffmtable,[langname].normproto) >> >> Syntax: rename filename1 filename2 >> >> Eg: >> >> rename shapetable train.shapetable >> >> rename inttemp train.inttemp >> >> rename pffmtable train.pffmtable >> >> rename normproto train.normproto >> >> >> *Step 8: *Create .traineddata file >> >> Syntax: combine_tessdata [langname]. >> >> Eg: combine_tessdata train. >> >> >> Move .traineddata file to tesseract programs tessdata directory >> >> C:\Program Files\Tesseract-OCR\tessdata >> >> >> Run tesseract for trained fronts >> >> >> tesseract Test2.png stdout -l train >> >> >> I get a segmentation fault on Step 5. Has anyone ever had this issue >> before? Thank you in advance. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2fee7cdd-16cd-4997-8668-778e4d626c1an%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/2fee7cdd-16cd-4997-8668-778e4d626c1an%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yL%3DVSKQ5P%2Bxt7G5PAUQ0YSux7erxv%3DLi81GpZAEdatYQ%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yL%3DVSKQ5P%2Bxt7G5PAUQ0YSux7erxv%3DLi81GpZAEdatYQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMrq%3DiTz5bVY%3DbVz32N7gW2uvPrV8rf%3DJC86fft0V23PRoEf3Q%40mail.gmail.com.