OK!   Thank You!

On Sun, Aug 14, 2022 at 9:50 AM Zdenko Podobny <zde...@gmail.com> wrote:

> try to follow official training docs:
> https://tesseract-ocr.github.io/tessdoc/#training-for-tesseract-5
>
> Zdenko
>
>
> ne 14. 8. 2022 o 7:59 Benjamin Hall <codenamejupit...@gmail.com>
> napĂ­sal(a):
>
>> I am using tesseract (Pytesseract to be exact)on a video and I am trying
>> to get better detections buy trying to train the tesseract on a new font so
>> I found a tutorial at:
>>
>> https://www.youtube.com/watch?v=1v8BPw0Dn0I
>>
>>
>> And This tutorial give very simple and straight forward direction on how
>> to do this (posted below):
>>
>>
>> *Step 1: *Make box files for images that we want to train
>>
>> Syntax: tesseract [langname].[fontname].[expN].[file-extension]
>> [langname].[fontname].[expN] batch.nochop makebox
>>
>> Eg:tesseract train.my.exp0.tif train.my.exp0 batch.nochop makebox
>>
>>
>> {*Note: After making box files we have to change or modify wrongly
>> identified characters in box files.}
>>
>>
>> *Step 2:* Create .tr file (Compounding image file and box file)
>>
>> Syntax: tesseract [langname].[fontname].[expN].[file-extension]
>> [langname].[fontname].[expN] box.train
>>
>> Eg: tesseract train.my.exp.tif train.my.exp0 box.train
>>
>>
>> *Step 3:* Extract the charset from the box files (Output for this
>> command is unicharset file)
>>
>> Syntax: unicharset_extractor [langname].[fontname].[expN].box
>>
>> Eg: unicharset_extractor train.my.exp0.box
>>
>>
>> *Step 4:* Create a font_properties file based on our needs.
>>
>> Syntax: echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0
>> or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" [angle bracket should be here]
>> font_properties
>>
>> Eg: echo "arial 0 0 1 0 0" [angled bracket] font_properties
>>
>>
>> *Step 5:* Training the data.
>>
>> Syntax: mftraining -F font_properties -U unicharset -O
>> [langname].unicharset [langname].[fontname].[expN].tr
>>
>> Eg: mftraining -F font_properties -U unicharset -O train.unicharset
>> train.my.exp0.tr
>>
>>
>> *Step 6:*
>>
>> Syntax: cntraining [langname].[fontname].[expN].tr
>>
>> Eg: cntraining train.my.exp0.tr
>>
>> {*Note:After step 5 and step 6 four files were
>> created.(shapetable,inttemp,pffmtable,normproto) }
>>
>>
>> *Step 7:* Rename four files (shapetable,inttemp,pffmtable,normproto)
>> into
>> ([langname].shapetable,[langname].inttemp,[langname].pffmtable,[langname].normproto)
>>
>> Syntax: rename filename1 filename2
>>
>> Eg:
>>
>>     rename shapetable train.shapetable
>>
>>     rename inttemp train.inttemp
>>
>>     rename pffmtable train.pffmtable
>>
>>     rename normproto train.normproto
>>
>>
>> *Step 8: *Create .traineddata file
>>
>> Syntax: combine_tessdata [langname].
>>
>> Eg: combine_tessdata train.
>>
>>
>> Move .traineddata file to tesseract programs tessdata directory
>>
>> C:\Program Files\Tesseract-OCR\tessdata
>>
>>
>> Run tesseract for trained fronts
>>
>>
>> tesseract Test2.png stdout -l train
>>
>>
>> I get a segmentation fault on Step 5.  Has anyone ever had this issue
>> before?  Thank you in advance.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/2fee7cdd-16cd-4997-8668-778e4d626c1an%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/2fee7cdd-16cd-4997-8668-778e4d626c1an%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yL%3DVSKQ5P%2Bxt7G5PAUQ0YSux7erxv%3DLi81GpZAEdatYQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yL%3DVSKQ5P%2Bxt7G5PAUQ0YSux7erxv%3DLi81GpZAEdatYQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMrq%3DiTz5bVY%3DbVz32N7gW2uvPrV8rf%3DJC86fft0V23PRoEf3Q%40mail.gmail.com.

Reply via email to