Actually, there's an issue already on this point:
http://code.google.com/p/tesseract-ocr/issues/detail?id=455&sort=-id
I don't see any progress on it, though
Warm regards,
Dmitri Silaev
On Thu, Mar 31, 2011 at 7:55 AM, patrickq wrote:
> Upon further experimentation I think I found out that t
If you're going to elaborate on this issue, it would be great if you share
your findings with the community. This topic might be of interest not only
for newbies but for experienced users too.
Dmitri
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr"
Could you give us a link to where the text of this article can be
downloaded from? Can't find it anywhere, only the title and authors.
On Thu, Mar 31, 2011 at 6:09 AM, Cong Nguyen wrote:
> Please refer to "OPTIMIZING SPEED FOR ADAPTIVE LOCAL THRESHOLDING ALGORITHM
> USING DYNAMIC PROGRAMMING".
>
Liuguanqiang,
Well, now I guess, I understand what you want. You have a text consisting of
arbitrary characters: digits and Chinese letters. You goal is to find in
this text a particular fragment, knowing it can be comprised by digits 5678
only. Confirm?
If so, the first thing to do is to set the
Upon further experimentation I think I found out that the whole
whitelist is render irrelevant whenever a character in the blacklist
is NOT in the training set ... this is crazy of course but it appears
to be the case, as if the code handling this list decides to stop
processing the list if one of
Thats simple, use the "0123456789" as the whitelist and then write a code on
top of it to convert the unwanted numbers to null. Your code can handle this
instead of tesseract.
--
Regards,
Saurabh Gandhi
2011/3/31 liuguanqiang
> For example, I use the eng.traineddata(setwhitelist to "0123456
For example, I use the eng.traineddata(setwhitelist to "0123456789") to
recognize the digital in the following picture:
The tesseract output the correct result: "24013091"
Now, I have known there are only "5678" in the input image, So I setwhitelist
to "5678".
On the above image, the tesseract
I am trying to provide a black list with UTF8 characters specified
using their byte codes, as follows:
// U+FB00 ff ef ac 80LATIN SMALL LIGATURE FF
// U+FB01 fi ef ac 81LATIN SMALL LIGATURE FI
myTess->SetVariable("tessedit_char_blackli
Please refer to "OPTIMIZING SPEED FOR ADAPTIVE LOCAL THRESHOLDING ALGORITHM
USING DYNAMIC PROGRAMMING".
Complexity is: O(n), n is number of pixels.
-Original Message-
From: tesseract-ocr@googlegroups.com [mailto:tesseract-ocr@googlegroups.com]
On Behalf Of Max Cantor
Sent: Thursday, March
Yes. I've had great experience with sauvola binarize from leptonica. Gamer
works too but is much much slower
On Mar 31, 2011, at 0:02, cong nguyenba wrote:
> I have another approach for you here: try to apply binarization using
> adaptive threshold! Delving into engine by following apdaptive
>
Page layout in tesseract engine maybe not enough robustness! You can
get more approachs from ICDAR conference!
On Wednesday, March 30, 2011, Dmitri Silaev wrote:
> The -psm command line arg does work. In rev580.
> But still an issue in rev549.
>
> So the easiest way for you, Patrick, is to checko
I have another approach for you here: try to apply binarization using
adaptive threshold! Delving into engine by following apdaptive
classification in source code for speedup! I think it is enough for
your expectation!
On Wednesday, March 30, 2011, Dmitri Silaev wrote:
> P.S.: If you're still sur
The -psm command line arg does work. In rev580.
But still an issue in rev549.
So the easiest way for you, Patrick, is to checkout the latest revision...
Regards,
Dmitri
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, se
P.S.: If you're still sure that reasonable downscaling of your images
sacrifices the accuracy, please share one or two of your *unprocessed*
images to investigate further.
And I'd suggest to keep up with the latest revisions of Tesseract. The
API changes significantly, but Tess is definitely being
Depending on the quality of your source images, I think it'd be
reasonable to scale them down in order for letters to have the height
of 40 pixels or so. In that way Tesseract will just have to do a bit
less work - scan lesser pixels and construct shorter glyph outlines.
The accuracy may suffer ev
On Wed, Mar 30, 2011 at 8:55 AM, Max Cantor wrote:
> I had a similar issue. I couldn't get the config to work but basically
> added this line to my code and it worked:
>
>api.SetPageSegMode(tesseract::PSM_SINGLE_COLUMN);
>
> For some reason, the tesseract binary doesn't pick up the config, b
Hi,
unfortunately some fixes regarding windows build was committed after
releasing 3.00 version (=revision 498).
I thought about 3.00.1 release (=revision 525) and as "temporary solution" I
created 3.00.1 tesseract.exe (somebody ask for it). Than I changed my mind
because it looks that developers
I had a similar issue. I couldn't get the config to work but basically added
this line to my code and it worked:
api.SetPageSegMode(tesseract::PSM_SINGLE_COLUMN);
For some reason, the tesseract binary doesn't pick up the config, but I copied
the binary source and added that.
Max
On Mar
Hello
I have some problems and many questions and i hope you will have
answers:
1) when loading the hole project, "combine_tessadata" did not load
with the 17 project : is this a problem that causes a problem when
generating tesseract.exe.
2) Should i exucute tesseract-3.00.1.exe to have the rig
19 matches
Mail list logo