On Thu, Apr 28, 2011 at 6:03 PM, Oleg Tikhonov <olegtikho...@gmail.com>wrote:

> Hi guys,
>
> I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected
> language is English.
> I tried to add/teach the system the Korean. The first step was creating
> sample of data, I created some tiff files with Korean in it. After, I ran
> tesseract command:
> tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[num]
> batch.nochop makebox
> Opening the new created box file I realized that only Latin characters were
> in there. What's wrong?
>

Nothing is wrong ;-) If you did not speciefied language (with -l option see
[1]), tesseract used default language: English. And as far as I know English
uses  Latin character only. So try to add  '-l kor' to your command (but do
not forget to install [2]).


> Might be I have to change a system language?
>

As far as I know tesseract do not care about system language.


> Please advise me how anyway to create a training data set? Thank you in
> advance,
>
>
General rules are written here [3]. I suggest to follow them closely. Have a
look on provided boxtiff files [4] for spa, eng, deu, ita, fra, nld as
examples.

There was aim for automatic training [5], but when the project
(tesseractindic) moved to gihtub I can not find the folder
(tesseract_trainer) in source code anymore.

Last advice: share your experiences with others ;-)

Zdenko

[1]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Bootstrapping_a_new_character_set
[2] http://tesseract-ocr.googlecode.com/files/kor.traineddata.gz
[3]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images
[4] http://code.google.com/p/tesseract-ocr/downloads/list
[5]
http://code.google.com/p/tesseractindic/source/browse/#svn%2Ftrunk%2Ftesseract_trainer


Oleg
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to