Thanks very much for the advice. The ocr-evaluation tools look particularly useful
On Friday, 25 January 2019 12:04:13 UTC, shree wrote: > > also see > > https://github.com/impactcentre/ocrevalUAtion > > https://github.com/Shreeshrii/ocr-evaluation-tools > > https://github.com/tesseract-ocr/test/tree/master/unlvtests > > > > On Fri, Jan 25, 2019 at 5:17 PM Lorenzo Bolzani <l.bo...@gmail.com > <javascript:>> wrote: > >> This is an option if you want to consider missing/extra chars too: >> >> https://en.wikipedia.org/wiki/Levenshtein_distance >> >> You should be able to find implementations for most languages. >> >> >> Bye >> >> Lorenzo >> >> >> >> Il giorno ven 25 gen 2019 alle ore 11:56 Matthew Hodgskiss < >> matthew....@gmail.com <javascript:>> ha scritto: >> >>> Hi, >>> >>> I am interested in evaluating the performance of Tesseract against some >>> domain specific test. I would like to perform a baseline using vanilla >>> settings and then with some domain-specific user-words and user-patterns as >>> documented here >>> <https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage>. >>> Is it possible to leverage the OCR evaluation process, which must be >>> performed during model training to calculate word and character error rates >>> on new (domain-specific) documents? >>> >>> If this is not possible, then I could synthesise my own scan images from >>> documents using ImageMagick >>> <https://gist.github.com/ThisIsBenny/1e669954d0fd0a945e38d0670c670c3c> >>> but it would be good if anyone could recommend a standard >>> algorithm/library for calculating character and word error rates. >>> >>> Thanks in advance >>> >>> Matt >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com <javascript:>. >>> To post to this group, send email to tesser...@googlegroups.com >>> <javascript:>. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/5cb0a65c-dae5-431b-9d0c-2c099d2cf90b%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/5cb0a65c-dae5-431b-9d0c-2c099d2cf90b%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzNnBGd0SPwtQGS%3DHpxxCEyBtLWCZPwCUhaOWJO7UJvHg%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzNnBGd0SPwtQGS%3DHpxxCEyBtLWCZPwCUhaOWJO7UJvHg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b7c23aa3-f2ae-4e9d-b7ca-d3d514a15412%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.