Re: creating train data set for Korean

2013-09-22 Thread Oleg Tikhonov
You welcome !!! On Sun, Sep 22, 2013 at 11:25 PM, clyde wrote: > Thank You Oleg and Zdenko! > > I am able to solve my problem. I just repeat the procedure in > https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#font_properties_(new_in_3.01) > . > Thank you so much!!! > > -- > -- >

Re: creating train data set for Korean

2013-09-22 Thread clyde
Thank You Oleg and Zdenko! I am able to solve my problem. I just repeat the procedure in https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#font_properties_(new_in_3.01) . Thank you so much!!! -- -- You received this message because you are subscribed to the Google Groups "tesser

Re: creating train data set for Korean

2013-09-22 Thread zdenko podobny
On Sun, Sep 22, 2013 at 9:35 PM, clyde wrote: > Hello again Oleg,, > > i am able to create korLang.traineddata, but when I try to use the > traineddata file I created > command: tesseract korLang.font1.exp1.tif output -l korLang > > I got this error: > tessdata_manager.SeekToStart(TESSDATA_INTTEM

Re: creating train data set for Korean

2013-09-22 Thread clyde
Hello again Oleg,, i am able to create korLang.traineddata, but when I try to use the traineddata file I created command: tesseract korLang.font1.exp1.tif output -l korLang I got this error: tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file ..\.. \classify\adaptmatch.c

Re: creating train data set for Korean

2013-09-22 Thread Oleg Tikhonov
Hi, First of all please read the wiki of the tesseract. There is a quite good explanation how to do this. If it's not enough, you can refer to http://misteroleg.wordpress.com/2012/12/19/ocr-using-tesseract-and-imagemagick-as-pre-processing-task/ . Hope it helps. On Sun, Sep 22, 2013 at 3:33 P

Re: creating train data set for Korean

2013-09-22 Thread clyde
Hello Oleg, Could you please post the Steps that you did to train Tesseract OCR with Korean text. I am hoping for your response. Thank you in advance! -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to te

Re: creating train data set for Korean

2011-04-29 Thread Oleg Tikhonov
Hi guys, Finally, the problem was with cygwin, somehow it installed tesseract 2.x (couple of libraries) and linked to the tesseract 3.0. Probably it mixed up and disturbed to work correctly. I uninstalled all cygwin packages, installed MS VS Studio 2008 Express instead, svn-ed tesseract 3.0.1, bu

Re: creating train data set for Korean

2011-04-29 Thread Oleg Tikhonov
Interesting I used cygwin, windows 7. Generally, I installed leptonika and its dependencies, after that I installed tesseract 3.0 from the archive file. ./runautoconfig ./configure make make install I checked the config_auto.h -> /* Version number */ #define PACKAGE_VERSION "3.00" /* Offic

Re: creating train data set for Korean

2011-04-29 Thread Quan Nguyen
Looks like you're running Tesseract 2.0x version, which does not support Oriental scripts. Download, install Tesseract 3.01 and try training again. On Apr 29, 7:09 am, Oleg Tikhonov wrote: > Here is a command and the error/message > > $ tesseract.exe ../korean_training/annyong_eng.png > ../korean

Re: creating train data set for Korean

2011-04-29 Thread zdenko podobny
Oleg, Are you sure with message? "tesseract.exe" indicate that you are using Windows... (I am not aware that any official linux build system create 'tesseract.exe') But part error message ('/usr/share/tessdata/') indicates that you are in linux (or unix like) environment... You wrote that you ins

Re: creating train data set for Korean

2011-04-29 Thread Oleg Tikhonov
Zdenko, Honestly, I did not read a whole page, beg your pardon. Here is a command and the error/message $ tesseract.exe ../korean_training/annyong_eng.png ../korean_training/annyong_eng.png -l kor batch.nochop makebox Unable to load unicharset file /usr/share/tessdata/kor.unicharset Thanks, --

Re: creating train data set for Korean

2011-04-29 Thread Oleg Tikhonov
Zdenko, Quan and Sven, Thanks a lot for your suggestions, I think you nailed the problem, So, I installed the Korean language pack :-) however an archive has only one file - kor.traineddata. It doesn't have kor.unicharset, it causes a problem that during "loading" kor.traineddata, tesseract also de

Re: creating train data set for Korean

2011-04-28 Thread Sven Pedersen
Hi Oleg, As Quan said, you need a higher resolution image, about 200--300 dpi and it needs to be binary (black&white) not grayscale or color. Screenshots are typically only 72 -- 90 dpi. I see that the wiki says the character size in pixels in a confusing way. --Sven 2011/4/28 Quan Nguyen : > Pri

Re: creating train data set for Korean

2011-04-28 Thread Quan Nguyen
Print screens are, in general, not adequate for training new languages. You'd be better off using GIMP to produce your TIFF images. Be sure to specify the language to bootstrap the new charset, such as: $ tesseract.exe ../korean_training/kor.ariel.exp1.tif ../ korean_training/kor.ariel.exp1 -l kor

Re: creating train data set for Korean

2011-04-28 Thread Oleg Tikhonov
Hi Sven, Here is what I've done: 1. Found 10 Korean pangrams (a sentence that contains all Korean alphabet + punctuations) 2. Opened notepad++ and pasted line by line each pangram mixed up with punctuation, changed encoding to utf8, increased the font size to 12pxl, formatted a whole text that

Re: creating train data set for Korean

2011-04-28 Thread zdenko podobny
On Thu, Apr 28, 2011 at 6:03 PM, Oleg Tikhonov wrote: > Hi guys, > > I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected > language is English. > I tried to add/teach the system the Korean. The first step was creating > sample of data, I created some tiff files with Korean in

Re: creating train data set for Korean

2011-04-28 Thread Aravinda VK
The generated box will not contain Korean characters. Use any box editors mentioned in training page. Box editors are created for that purpose. Box editors will split the image blocks from tif provided, and create a rectangle area and asigns some value to it. Adjust the size of these rectangles in

Re: creating train data set for Korean

2011-04-28 Thread Sven Pedersen
Hi Oleg, Did you create a file with mapping of character codes? Or Korean text file that you printed and scanned in? Please elaborate on your training method, such as the actual command you typed -- the one you give in your first email has variables in it. --Sven On Thu, Apr 28, 2011 at 11:23 AM,

Re: creating train data set for Korean

2011-04-28 Thread Oleg Tikhonov
It's exactly where I'm started and stuck. The produced box does not contain any Korean character only Latin ones. And that is a problem. On Thu, Apr 28, 2011 at 7:08 PM, Sriranga(78yrsold) wrote: > please read wiki on tesseract3 wherein details how to train lang > > On Thu, Apr 28, 2011 at 9:33

Re: creating train data set for Korean

2011-04-28 Thread Sriranga(78yrsold)
please read wiki on tesseract3 wherein details how to train lang On Thu, Apr 28, 2011 at 9:33 PM, Oleg Tikhonov wrote: > Hi guys, > > I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected > language is English. > I tried to add/teach the system the Korean. The first step was cr