subject:"Extracting files from .tessdata"

Re: Extracting files from .tessdata

2010-05-22 Thread Zdenko Podobný

Hello Ramon, tesseract-ocr is developed by google (see http://groups.google.com/group/tesseract-ocr/msg/7408c699e27db341). I hope that after solving all/some issues final version of tesseract-ocr 3.00 will be released including tif+box files... Zd. Dn(a 20.05.2010 10:53, Ramon wrote / napísal(a

Re: Extracting files from .tessdata

2010-05-21 Thread Jimmy O'Regan

On 20 May 2010, at 09:53, Ramon wrote: Hi Zdenko, After some tests, I realized I need the tiff pair boxes that the creators used to generate Catalan tessdata file. Do you know a way to contact to them? That might be difficult. As you said before, you might be able to reuse the Spanish fil

Re: Extracting files from .tessdata

2010-05-21 Thread Ramon

Hi Zdenko, After some tests, I realized I need the tiff pair boxes that the creators used to generate Catalan tessdata file. Do you know a way to contact to them? Ramon. On 29 Abr, 23:49, Zdenko Podobný wrote: > Hi Ramon, > > I do not have source files for dawg dictionaries and I am not abl

Re: Extracting files from .tessdata

2010-04-29 Thread Zdenko Podobný

Hi Ramon, I do not have source files for dawg dictionaries and I am not able to "decompile" them. Anyway I think to create dictionaries is the easiest part of tesseract training: based on wiki[1] input is simple utf-8 file with one word per line. This file is split to several files: * lang.pu

Re: Extracting files from .tessdata

2010-04-29 Thread Ramon

Hi for you quick answer Zdenko. As you pointed out, I'm already using tif / box pair from spanish language to train my catalan .traineddata language. (As spanish characters suits catalan characters too). But doing just this (with no words in dictionary files) the dictionary is not quite good. I t

Re: Extracting files from .tessdata

2010-04-28 Thread zdenko podobny

Hello Ramon, for extending existing language you need "Tif/Box pairs" see http://code.google.com/p/tesseract-ocr/wiki/FAQ and there "How do I add just one character or one font to my favourite language, without having to retrain from scratch?" Unfortunately tif/box pairs are provided only for eng

Extracting files from .tessdata

2010-04-28 Thread Ramon

Hi, After some tests I realized the best for me is to put effort to extend the Catalan Diccionari which is in svn repository (v3). It will be so useful if you can do one of these: -> deliver the different files combined to create the cat.traineddata unified file. (the utf8 files used to generate t

Re: Extracting files from .tessdata

Re: Extracting files from .tessdata

Re: Extracting files from .tessdata

Re: Extracting files from .tessdata

Re: Extracting files from .tessdata

Re: Extracting files from .tessdata

Extracting files from .tessdata

7 matches

Site Navigation

Mail list logo

Footer information