[I keep posting this, but it doesn't show up after 1 day. Attempt #4]
I only have a limited number of samples, but that seems to be enough
for now (100% detection rate for everything i have). If I can collate
a few more then i shall retrain and post the updated language files
and maybe a T3 compil
[I keep posting this, but it doesn't show up after several hours.
Removing the link to see...]
I only have a limited number of samples, but that seems to be enough
for now (100% detection rate for everything i have). If I can collate
a few more then i shall retrain and post the updated language fi
I only have a limited number of samples, but that seems to be enough
for now (100% detection rate for everything i have). If I can collate
a few more then i shall retrain and post the updated language files
and maybe a T3 compiled file.
I have posted the source files and the compile T2 language fi
In TrainingTesseract3's wiki is written:
"NOTE The unicharset file must be regenerated whenever inttemp,
normproto and pffmtable are generated (i.e. they must all be recreated
when the box file is changed) as they have to be in sync".
Question is: must DAWG files be in sync, too? Or can I use DAWG
I only have a limited number of samples, but that seems to be enough
for now (100% detection rate for everything i have). If I can collate
a few more then i shall retrain and post the updated language files
and maybe a T3 compiled file.
I have posted the source files and the compile T2 language fi
W dniu 2011-05-27 14:30, Utkarsh pisze:
After manipulating an image, I'm sending a very short string to
tesseract. The image has strings like "1-A", "3-F",etc. The problem is
it cannot detect these alphabets on their own. I tried adding a dummy
word after it (to make it look like "1-A test") and
Hi Dmitri,
thank you very much for your hint. I retrained the Engine with the
appropriate font and did thresholding with 80% on my target zone.
Additionally I used a custom dictonary with the drugs that can be
prescribed. Now i'm reaching nearly 100% accuracy in OCR in my
specific target zone.
Bi
Nice, glad I could help ))
--
Dmitri
On Mon, May 30, 2011 at 5:03 AM, Erik Reisig wrote:
> Hi Dmitri,
>
> thank you very much for your hint. I retrained the Engine with the
> appropriate font and did thresholding with 80% on my target zone.
> Additionally I used a custom dictonary with the dr
In previous releases of Tesseract you could use the same uinicharset
file as long as the character set stays the same. In recent releases,
uncharset's file format had been extended and now it contains info for
restricting relative size and position of glyphs. However, you can
still use the old "sho
Hi there,
OK, found it out by myself: here are the steps:
1. Create 01.tr with tesseract 01.tif 01 nobatch box.train
2. Create 02.tr with tesseract 02.tif 02 nobatch box.train
3. Create unicharset with: unicharset_extractor 01.box 02.box
4. Just copy it (maybe it is not necessary) cp unicharset
0
It it written on training doc[1]:
"*…**each .tr filename must match an entry in the font_properties file, or
mftraining will abort.*"
So you could save your time if you read documentation.
Zdenko
[1]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#font_properties_(new_in_3.01)
On
11 matches
Mail list logo