Thx for your answer Quan Nguyen, and sorry for my unclear question!
I can get hocr output... but it does not contain any "" tags when
ocr'ing italic texts.
Is this working for anybody?
On Apr 29, 5:46 am, Quan Nguyen wrote:
> http://groups.google.com/group/tesseract-ocr/browse_thread/thread/2f4
I bet you'll need a GUI to combine images.
Otherwise you'll need a script anyway and it's not worth it.
If you're on Windows, I can suggest one of the best and free tools -
FastStone Image Viewer.
Making multipage TIFFs is only one of its numerous great features.
Warm regards,
Dmitri Silaev
www.C
Hi Oleg,
As Quan said, you need a higher resolution image, about 200--300 dpi
and it needs to be binary (black&white) not grayscale or color.
Screenshots are typically only 72 -- 90 dpi. I see that the wiki says
the character size in pixels in a confusing way.
--Sven
2011/4/28 Quan Nguyen :
> Pri
On Fri, 29 Apr 2011, Dmitri Silaev wrote:
No, with Tesseract itself it's not possible.
This is a job for old good batch files or scripts.
Warm regards,
Dmitri Silaev
www.CustomOCR.com
On Fri, Apr 29, 2011 at 5:41 AM, faye wrote:
Is there an option to let tessarct write the output of seve
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/2f408e3f9b054edb
http://code.google.com/p/tesseract-ocr/issues/detail?id=377#c5
On Apr 28, 7:54 am, Nikse wrote:
> I can see that in baseapi.cpp in method "GetHOCRText" there seems to
> be support for italic in line 936/937:
>
You can try VietOCR, a frontend program which uses Tesseract engine to
perform OCR on multi-page TIFF or individual ones and appends the
output to previous results.
On Apr 28, 8:41 pm, faye wrote:
> Is there an option to let tessarct write the output of several images
> into one large textfile?
>
Print screens are, in general, not adequate for training new
languages. You'd be better off using GIMP to produce your TIFF images.
Be sure to specify the language to bootstrap the new charset, such as:
$ tesseract.exe ../korean_training/kor.ariel.exp1.tif ../
korean_training/kor.ariel.exp1 -l kor
No, with Tesseract itself it's not possible.
This is a job for old good batch files or scripts.
Warm regards,
Dmitri Silaev
www.CustomOCR.com
On Fri, Apr 29, 2011 at 5:41 AM, faye wrote:
> Is there an option to let tessarct write the output of several images
> into one large textfile?
>
> I
Is there an option to let tessarct write the output of several images
into one large textfile?
I have scanned a book and want to OCR all pages into one big textfile
if possible (instead of copying all textfiles later into one)
kind regards
Faye
--
You received this message because you are subs
Hi Sven,
Here is what I've done:
1. Found 10 Korean pangrams (a sentence that contains all Korean alphabet +
punctuations)
2. Opened notepad++ and pasted line by line each pangram mixed up with
punctuation, changed encoding to utf8, increased the font size to 12pxl,
formatted a whole text that
On Thu, Apr 28, 2011 at 6:03 PM, Oleg Tikhonov wrote:
> Hi guys,
>
> I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected
> language is English.
> I tried to add/teach the system the Korean. The first step was creating
> sample of data, I created some tiff files with Korean in
The generated box will not contain Korean characters. Use any box editors
mentioned in training page. Box editors are created for that purpose. Box
editors will split the image blocks from tif provided, and create a
rectangle area and asigns some value to it. Adjust the size of these
rectangles in
Hi Oleg,
Did you create a file with mapping of character codes? Or Korean text
file that you printed and scanned in? Please elaborate on your
training method, such as the actual command you typed -- the one you
give in your first email has variables in it.
--Sven
On Thu, Apr 28, 2011 at 11:23 AM,
It's exactly where I'm started and stuck. The produced box does not contain
any Korean character only Latin ones. And that is a problem.
On Thu, Apr 28, 2011 at 7:08 PM, Sriranga(78yrsold) wrote:
> please read wiki on tesseract3 wherein details how to train lang
>
> On Thu, Apr 28, 2011 at 9:33
please read wiki on tesseract3 wherein details how to train lang
On Thu, Apr 28, 2011 at 9:33 PM, Oleg Tikhonov wrote:
> Hi guys,
>
> I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected
> language is English.
> I tried to add/teach the system the Korean. The first step was cr
Hi guys,
I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected
language is English.
I tried to add/teach the system the Korean. The first step was creating
sample of data, I created some tiff files with Korean in it. After, I ran
tesseract command:
tesseract [lang].[fontname].ex
I can see that in baseapi.cpp in method "GetHOCRText" there seems to
be support for italic in line 936/937:
if (word->italic > 0)
hocr_str += "";
Does anybody know if that's supposed to work?
TIA
Nikolaj
--
You received this message because you are subscribed to the Google
Groups
Hi,
Hopefully I am at the right place to put mi question. As person with poor
vision, as swell user of a screen reader the freeocr program is a very welcome
tool for me to use. In case of making use of this screen reader I am working on
de pc using short keys. Here comes up mi problem. Making u
18 matches
Mail list logo