Hello Ramon,
tesseract-ocr is developed by google (see
http://groups.google.com/group/tesseract-ocr/msg/7408c699e27db341). I
hope that after solving all/some issues final version of tesseract-ocr
3.00 will be released including tif+box files...
Zd.
Dn(a 20.05.2010 10:53, Ramon wrote / napísal(a
On 20 May 2010, at 09:53, Ramon wrote:
Hi Zdenko,
After some tests, I realized I need the tiff pair boxes that the
creators used to generate Catalan tessdata file.
Do you know a way to contact to them?
That might be difficult. As you said before, you might be able to
reuse the Spanish fil
Hi Zdenko,
After some tests, I realized I need the tiff pair boxes that the
creators used to generate Catalan tessdata file.
Do you know a way to contact to them?
Ramon.
On 29 Abr, 23:49, Zdenko Podobný wrote:
> Hi Ramon,
>
> I do not have source files for dawg dictionaries and I am not abl
Hi Ramon,
I do not have source files for dawg dictionaries and I am not able to
"decompile" them. Anyway I think to create dictionaries is the easiest
part of tesseract training: based on wiki[1] input is simple utf-8 file
with one word per line. This file is split to several files:
* lang.pu
Hi for you quick answer Zdenko.
As you pointed out, I'm already using tif / box pair from spanish
language to train my catalan .traineddata language. (As spanish
characters suits catalan characters too).
But doing just this (with no words in dictionary files) the dictionary
is not quite good. I t
Hello Ramon,
for extending existing language you need "Tif/Box pairs" see
http://code.google.com/p/tesseract-ocr/wiki/FAQ and there "How do I add just
one character or one font to my favourite language, without having to
retrain from scratch?"
Unfortunately tif/box pairs are provided only for eng
Hi,
After some tests I realized the best for me is to put effort to extend
the Catalan Diccionari which is in svn repository (v3).
It will be so useful if you can do one of these:
-> deliver the different files combined to create the cat.traineddata
unified file. (the utf8 files used to generate t
7 matches
Mail list logo