eaning different spelling standards. Could these
be used for training Tesseract? How do I start?
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
Project Runeberg - free Nordic literature - http://runeberg.org/
--
You received this message because you are subscr
ccess to the
source code.
Oh, really. Is anybody taking the lead,
and do you have any funding for this?
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
Project Runeberg - free Nordic literature - http://runeberg.org/
--
You received this message because you are
books that Google has already scanned.
Does anybody know of an open source OCR
project that is based on statistics from
scanned books? Could parts of the Tesseract
software library be used to cut out letters
from scanned pages, so some other software
could group them statistically?
--
Lars Aro
pace
be excluded when computing the accuracy?
In my example, only missing characters are counted
as errors, but adding extra characters is not.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
--
You received this message because you are subscribed to the Google Groups
&qu
some
documentation for this file format, so I can read and
understand what's in there? I want to keep the part
that is about fraktur/blackletter and substitute the
part that is about Danish pre 1870 spelling for
something based on my Swedish dictionaries.
--
Lars Aronsson (l...@aronsson.se)
ng -r319, but then combine_tessdata
doesn't have all these flags.
Still, I'm not very interested in running the program, but to
understand the data. Is there no documentation for the format?
Should we write some?
Or is that something you keep internally at Google?
--
Lars Aronsson
Jimmy O'Regan wrote:
On 24 May 2010 17:41, Lars Aronsson wrote:
I tried to compile the current version (svn -r354 up), but failed:
Looks like a pair of missing casts - have you opened an issue?
No, I have not. I don't know enough of the software.
Err... I have no affiliation w
t;http://code.google.com/p/tesseract-ocr/source/detail?r=354>,
Mandrivalinux 2010.1 64bit",
but the compiler error message is full of "inT32"
and the prototype above says "int".
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
--
You received
rks for you?
Yes, this works fine, both "tesseract eurotext.tif output2"
and "combine_tessdata -u dan-frak.traineddata /tmp/foo."
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
--
You received this message because you are subscribed to the G
om the failure to
explain what Tesseract is.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.c
more than expected.
What to do? Should those, who made those files, make new versions
that will work with the new Tesseract? Or will Tesseract finally
incorporate Fraktur reading without the need to load separate
training files?
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free N
source has code to recognize
hyphenated words, and it should be possible to
implement this behaviour as an option.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
--
You received this message because you are subscribed to the Google Groups
ired:
I. Den ældre Stenalders Bopladser . 7.
How come? Is it the unusual line spacing that makes Tesseract
confused? Or the dotted line? Why does it fill in letters
where there should be word-separating spaces?
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic lite
13 matches
Mail list logo