Re: Get italic info from Tesseract 3 command line?

2011-04-28 Thread Nikse
Thx for your answer Quan Nguyen, and sorry for my unclear question! I can get hocr output... but it does not contain any "" tags when ocr'ing italic texts. Is this working for anybody? On Apr 29, 5:46 am, Quan Nguyen wrote: > http://groups.google.com/group/tesseract-ocr/browse_thread/thread/2f4

Re: Several input files into one output file

2011-04-28 Thread Dmitri Silaev
I bet you'll need a GUI to combine images. Otherwise you'll need a script anyway and it's not worth it. If you're on Windows, I can suggest one of the best and free tools - FastStone Image Viewer. Making multipage TIFFs is only one of its numerous great features. Warm regards, Dmitri Silaev www.C

Re: creating train data set for Korean

2011-04-28 Thread Sven Pedersen
Hi Oleg, As Quan said, you need a higher resolution image, about 200--300 dpi and it needs to be binary (black&white) not grayscale or color. Screenshots are typically only 72 -- 90 dpi. I see that the wiki says the character size in pixels in a confusing way. --Sven 2011/4/28 Quan Nguyen : > Pri

Re: Several input files into one output file

2011-04-28 Thread Robert Komar
On Fri, 29 Apr 2011, Dmitri Silaev wrote: No, with Tesseract itself it's not possible. This is a job for old good batch files or scripts. Warm regards, Dmitri Silaev www.CustomOCR.com On Fri, Apr 29, 2011 at 5:41 AM, faye wrote: Is there an option to let tessarct write the output of seve

Re: Get italic info from Tesseract 3 command line?

2011-04-28 Thread Quan Nguyen
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/2f408e3f9b054edb http://code.google.com/p/tesseract-ocr/issues/detail?id=377#c5 On Apr 28, 7:54 am, Nikse wrote: > I can see that in baseapi.cpp in method "GetHOCRText" there seems to > be support for italic in line 936/937: >    

Re: Several input files into one output file

2011-04-28 Thread Quan Nguyen
You can try VietOCR, a frontend program which uses Tesseract engine to perform OCR on multi-page TIFF or individual ones and appends the output to previous results. On Apr 28, 8:41 pm, faye wrote: > Is there an option to let tessarct write the output of several images > into one large textfile? >

Re: creating train data set for Korean

2011-04-28 Thread Quan Nguyen
Print screens are, in general, not adequate for training new languages. You'd be better off using GIMP to produce your TIFF images. Be sure to specify the language to bootstrap the new charset, such as: $ tesseract.exe ../korean_training/kor.ariel.exp1.tif ../ korean_training/kor.ariel.exp1 -l kor

Re: Several input files into one output file

2011-04-28 Thread Dmitri Silaev
No, with Tesseract itself it's not possible. This is a job for old good batch files or scripts. Warm regards, Dmitri Silaev www.CustomOCR.com On Fri, Apr 29, 2011 at 5:41 AM, faye wrote: > Is there an option to let tessarct write the output of several images > into one large textfile? > > I

Several input files into one output file

2011-04-28 Thread faye
Is there an option to let tessarct write the output of several images into one large textfile? I have scanned a book and want to OCR all pages into one big textfile if possible (instead of copying all textfiles later into one) kind regards Faye -- You received this message because you are subs

Re: creating train data set for Korean

2011-04-28 Thread Oleg Tikhonov
Hi Sven, Here is what I've done: 1. Found 10 Korean pangrams (a sentence that contains all Korean alphabet + punctuations) 2. Opened notepad++ and pasted line by line each pangram mixed up with punctuation, changed encoding to utf8, increased the font size to 12pxl, formatted a whole text that

Re: creating train data set for Korean

2011-04-28 Thread zdenko podobny
On Thu, Apr 28, 2011 at 6:03 PM, Oleg Tikhonov wrote: > Hi guys, > > I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected > language is English. > I tried to add/teach the system the Korean. The first step was creating > sample of data, I created some tiff files with Korean in

Re: creating train data set for Korean

2011-04-28 Thread Aravinda VK
The generated box will not contain Korean characters. Use any box editors mentioned in training page. Box editors are created for that purpose. Box editors will split the image blocks from tif provided, and create a rectangle area and asigns some value to it. Adjust the size of these rectangles in

Re: creating train data set for Korean

2011-04-28 Thread Sven Pedersen
Hi Oleg, Did you create a file with mapping of character codes? Or Korean text file that you printed and scanned in? Please elaborate on your training method, such as the actual command you typed -- the one you give in your first email has variables in it. --Sven On Thu, Apr 28, 2011 at 11:23 AM,

Re: creating train data set for Korean

2011-04-28 Thread Oleg Tikhonov
It's exactly where I'm started and stuck. The produced box does not contain any Korean character only Latin ones. And that is a problem. On Thu, Apr 28, 2011 at 7:08 PM, Sriranga(78yrsold) wrote: > please read wiki on tesseract3 wherein details how to train lang > > On Thu, Apr 28, 2011 at 9:33

Re: creating train data set for Korean

2011-04-28 Thread Sriranga(78yrsold)
please read wiki on tesseract3 wherein details how to train lang On Thu, Apr 28, 2011 at 9:33 PM, Oleg Tikhonov wrote: > Hi guys, > > I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected > language is English. > I tried to add/teach the system the Korean. The first step was cr

creating train data set for Korean

2011-04-28 Thread Oleg Tikhonov
Hi guys, I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected language is English. I tried to add/teach the system the Korean. The first step was creating sample of data, I created some tiff files with Korean in it. After, I ran tesseract command: tesseract [lang].[fontname].ex

Re: Get italic info from Tesseract 3 command line?

2011-04-28 Thread Nikse
I can see that in baseapi.cpp in method "GetHOCRText" there seems to be support for italic in line 936/937: if (word->italic > 0) hocr_str += ""; Does anybody know if that's supposed to work? TIA Nikolaj -- You received this message because you are subscribed to the Google Groups

Windows FREEOCR ACCESSIBILITY QUESTION...

2011-04-28 Thread Renzo van Buuren
Hi, Hopefully I am at the right place to put mi question. As person with poor vision, as swell user of a screen reader the freeocr program is a very welcome tool for me to use. In case of making use of this screen reader I am working on de pc using short keys. Here comes up mi problem. Making u