On Tue, Apr 17, 2012 at 4:26 PM, Nick White <nick.wh...@durham.ac.uk> wrote:

> On Mon, Apr 16, 2012 at 06:38:01PM +0200, zdenko podobny wrote:
> > I think in 3.02 will provide solution this cases: you can use more than
> one
> > language for OCR. e.g. you can run something like this:
> >
> > tesseract image output -l grc+ell
>
> Ah, that's a very good idea, and will indeed be useful. However for
> my usecase (a script which is mostly the same, but with additions,
> and an older version of the language), it would be useful to only
> use one set of dictionary files (rather than presumably the union of
> grc & ell, in the above example).
>
> I wonder if there's any good way of integrating this functionality
> in to tesseract; I could imagine changing the dictionary files
> wouldn't be a particularly unusual thing to want to do, as mappings
> of dictionaries and scripts is not going to be 1:1.
>
> As a workaround one could probably unpack the traineddata, remove
> the dictionary files (and add different ones if appropriate), then
> repack it. But ideally I think it would be good to be able to
> specify different dictionary files on the command line (and ideally
> as UTF-8 word per line files, which were converted into DAWG in
> memory if needed.)
>
> Do you mean something like "CONFIG FILES AND AUGMENTING WITH USER DATA"
[1] without user-patterns?

[1] http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html

-- 
Zdenko

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to