On Tue, Apr 17, 2012 at 4:26 PM, Nick White <nick.wh...@durham.ac.uk> wrote:
> On Mon, Apr 16, 2012 at 06:38:01PM +0200, zdenko podobny wrote: > > I think in 3.02 will provide solution this cases: you can use more than > one > > language for OCR. e.g. you can run something like this: > > > > tesseract image output -l grc+ell > > Ah, that's a very good idea, and will indeed be useful. However for > my usecase (a script which is mostly the same, but with additions, > and an older version of the language), it would be useful to only > use one set of dictionary files (rather than presumably the union of > grc & ell, in the above example). > > I wonder if there's any good way of integrating this functionality > in to tesseract; I could imagine changing the dictionary files > wouldn't be a particularly unusual thing to want to do, as mappings > of dictionaries and scripts is not going to be 1:1. > > As a workaround one could probably unpack the traineddata, remove > the dictionary files (and add different ones if appropriate), then > repack it. But ideally I think it would be good to be able to > specify different dictionary files on the command line (and ideally > as UTF-8 word per line files, which were converted into DAWG in > memory if needed.) > > Do you mean something like "CONFIG FILES AND AUGMENTING WITH USER DATA" [1] without user-patterns? [1] http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html -- Zdenko -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en