Thanks very much for the advice. The ocr-evaluation tools look particularly 
useful

On Friday, 25 January 2019 12:04:13 UTC, shree wrote:
>
> also see
>
> https://github.com/impactcentre/ocrevalUAtion
>
> https://github.com/Shreeshrii/ocr-evaluation-tools
>
> https://github.com/tesseract-ocr/test/tree/master/unlvtests
>
>
>
> On Fri, Jan 25, 2019 at 5:17 PM Lorenzo Bolzani <l.bo...@gmail.com 
> <javascript:>> wrote:
>
>> This is an option if you want to consider missing/extra chars too: 
>>
>> https://en.wikipedia.org/wiki/Levenshtein_distance
>>
>> You should be able to find implementations for most languages.
>>
>>
>> Bye
>>
>> Lorenzo
>>
>>
>>
>> Il giorno ven 25 gen 2019 alle ore 11:56 Matthew Hodgskiss <
>> matthew....@gmail.com <javascript:>> ha scritto:
>>
>>> Hi,
>>>
>>> I am interested in evaluating the performance of Tesseract against some 
>>> domain specific test. I would like to perform a baseline using vanilla 
>>> settings and then with some domain-specific user-words and user-patterns as 
>>> documented here 
>>> <https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage>.
>>> Is it possible to leverage the OCR evaluation process, which must be 
>>> performed during model training to calculate word and character error rates 
>>> on new (domain-specific) documents?
>>>
>>> If this is not possible, then I could synthesise my own scan images from 
>>> documents using ImageMagick 
>>> <https://gist.github.com/ThisIsBenny/1e669954d0fd0a945e38d0670c670c3c> 
>>> but it would be good if anyone could recommend a standard 
>>> algorithm/library for calculating character and word error rates.
>>>
>>> Thanks in advance
>>>
>>> Matt
>>>
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com <javascript:>.
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> <javascript:>.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/5cb0a65c-dae5-431b-9d0c-2c099d2cf90b%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/5cb0a65c-dae5-431b-9d0c-2c099d2cf90b%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzNnBGd0SPwtQGS%3DHpxxCEyBtLWCZPwCUhaOWJO7UJvHg%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzNnBGd0SPwtQGS%3DHpxxCEyBtLWCZPwCUhaOWJO7UJvHg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b7c23aa3-f2ae-4e9d-b7ca-d3d514a15412%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to