Re: [tesseract-ocr] Evaluating Tesseract with new domain-specific documents

2019-01-31 Thread Matthew Hodgskiss
Thanks very much for the advice. The ocr-evaluation tools look particularly useful On Friday, 25 January 2019 12:04:13 UTC, shree wrote: > > also see > > https://github.com/impactcentre/ocrevalUAtion > > https://github.com/Shreeshrii/ocr-evaluation-tools > > https://github.com/tesseract-ocr/test/

Re: [tesseract-ocr] Evaluating Tesseract with new domain-specific documents

2019-01-25 Thread Shree Devi Kumar
also see https://github.com/impactcentre/ocrevalUAtion https://github.com/Shreeshrii/ocr-evaluation-tools https://github.com/tesseract-ocr/test/tree/master/unlvtests On Fri, Jan 25, 2019 at 5:17 PM Lorenzo Bolzani wrote: > This is an option if you want to consider missing/extra chars too: >

Re: [tesseract-ocr] Evaluating Tesseract with new domain-specific documents

2019-01-25 Thread Lorenzo Bolzani
This is an option if you want to consider missing/extra chars too: https://en.wikipedia.org/wiki/Levenshtein_distance You should be able to find implementations for most languages. Bye Lorenzo Il giorno ven 25 gen 2019 alle ore 11:56 Matthew Hodgskiss < matthew.hodgsk...@gmail.com> ha scrit

[tesseract-ocr] Evaluating Tesseract with new domain-specific documents

2019-01-25 Thread Matthew Hodgskiss
Hi, I am interested in evaluating the performance of Tesseract against some domain specific test. I would like to perform a baseline using vanilla settings and then with some domain-specific user-words and user-patterns as documented here