Re: [tesseract-ocr] Ocr-d train - Tesseract 4.0 Training

Lorenzo Bolzani Mon, 04 Feb 2019 11:47:38 -0800

To use ocrd you need to prepare image files and txt files with the same
name but different extension.
For example:


sample1.png
sample1.gt.txt

The gt.txt is a simple text file containing the correct text, 145, for
example.

The images must be cropped with no border or just a couple of pixels. Text
height should be about 30/40px. Try different options to see what works
best.

To recognize numbers ONLY you also need to replaced the line:

   merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset
$(TRAIN)/my.unicharset  "$@"

with:

   cp "$(TRAIN)/my.unicharset" "data/unicharset"

in the makefile (see
https://groups.google.com/forum/#!searchin/tesseract-ocr/l.bolzani%7Csort:date/tesseract-ocr/be4-rjvY2tQ/32evtMHlAQAJ
)

Then follow the instructions on the ocrd site.

You can try 100, 250, 500, 1000 and 2000 iterations and see what works best
(it depends on how much data you have).


If you need to recognize nothing but handwritten numbers, you can also look
for github projects (not related to tesseract) about "MNIST" handwritten
numbers recognition with pre-trained models.


Bye

Lorenzo


Il giorno lun 4 feb 2019 alle ore 08:34 <sarathgi...@gmail.com> ha scritto:

> I am a beginner for OCR training. Can anyone explain how to use Ocr-d
> train briefly?
>
> I have Tesseract and Leptonica library installed in Cygwin
>
> tesseract 4.0.0
>  leptonica-1.77.0
>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 :
> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>  Found AVX2
>  Found AVX
>  Found SSE
>
> I want to train handwritten digits, because it is not detecting correctly
> by default traineddata. I have searched group and found no detailed
> instructions. I used Opencv and  python tesseract combination to achieve
> OCR of printed text and came to linux for handwritten digits training
> purpose. Kindly provide step by step instructions, it may help others also.
> I have attached the sample images which requires training. Thanks in advance
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/48ce49cc-6ade-4ebd-a1a6-5e382b033a95%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/48ce49cc-6ade-4ebd-a1a6-5e382b033a95%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLy1NUOT4-pJtmQjoNTjp-NBJNP13cQk3NNCu%3D8fVHSp5g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Ocr-d train - Tesseract 4.0 Training

Reply via email to