To use ocrd you need to prepare image files and txt files with the same name but different extension. For example:
sample1.png sample1.gt.txt The gt.txt is a simple text file containing the correct text, 145, for example. The images must be cropped with no border or just a couple of pixels. Text height should be about 30/40px. Try different options to see what works best. To recognize numbers ONLY you also need to replaced the line: merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset $(TRAIN)/my.unicharset "$@" with: cp "$(TRAIN)/my.unicharset" "data/unicharset" in the makefile (see https://groups.google.com/forum/#!searchin/tesseract-ocr/l.bolzani%7Csort:date/tesseract-ocr/be4-rjvY2tQ/32evtMHlAQAJ ) Then follow the instructions on the ocrd site. You can try 100, 250, 500, 1000 and 2000 iterations and see what works best (it depends on how much data you have). If you need to recognize nothing but handwritten numbers, you can also look for github projects (not related to tesseract) about "MNIST" handwritten numbers recognition with pre-trained models. Bye Lorenzo Il giorno lun 4 feb 2019 alle ore 08:34 <sarathgi...@gmail.com> ha scritto: > I am a beginner for OCR training. Can anyone explain how to use Ocr-d > train briefly? > > I have Tesseract and Leptonica library installed in Cygwin > > tesseract 4.0.0 > leptonica-1.77.0 > libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : > libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 > Found AVX2 > Found AVX > Found SSE > > I want to train handwritten digits, because it is not detecting correctly > by default traineddata. I have searched group and found no detailed > instructions. I used Opencv and python tesseract combination to achieve > OCR of printed text and came to linux for handwritten digits training > purpose. Kindly provide step by step instructions, it may help others also. > I have attached the sample images which requires training. Thanks in advance > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/48ce49cc-6ade-4ebd-a1a6-5e382b033a95%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/48ce49cc-6ade-4ebd-a1a6-5e382b033a95%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLy1NUOT4-pJtmQjoNTjp-NBJNP13cQk3NNCu%3D8fVHSp5g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.