On Thu, Apr 28, 2011 at 6:03 PM, Oleg Tikhonov <olegtikho...@gmail.com>wrote:
> Hi guys, > > I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected > language is English. > I tried to add/teach the system the Korean. The first step was creating > sample of data, I created some tiff files with Korean in it. After, I ran > tesseract command: > tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[num] > batch.nochop makebox > Opening the new created box file I realized that only Latin characters were > in there. What's wrong? > Nothing is wrong ;-) If you did not speciefied language (with -l option see [1]), tesseract used default language: English. And as far as I know English uses Latin character only. So try to add '-l kor' to your command (but do not forget to install [2]). > Might be I have to change a system language? > As far as I know tesseract do not care about system language. > Please advise me how anyway to create a training data set? Thank you in > advance, > > General rules are written here [3]. I suggest to follow them closely. Have a look on provided boxtiff files [4] for spa, eng, deu, ita, fra, nld as examples. There was aim for automatic training [5], but when the project (tesseractindic) moved to gihtub I can not find the folder (tesseract_trainer) in source code anymore. Last advice: share your experiences with others ;-) Zdenko [1] http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Bootstrapping_a_new_character_set [2] http://tesseract-ocr.googlecode.com/files/kor.traineddata.gz [3] http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images [4] http://code.google.com/p/tesseract-ocr/downloads/list [5] http://code.google.com/p/tesseractindic/source/browse/#svn%2Ftrunk%2Ftesseract_trainer Oleg > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en