[tesseract-ocr] Re: Using tesseract_best (or other models?) for 18th-century English printed text

Tom Morris Sun, 25 May 2025 10:54:21 -0700

On Monday, April 21, 2025 at 12:03:33 PM UTC-4 mcarlo...@gmail.com wrote:



Honestly, I am having the same amount of (or even more) errors than with 
the standard model. I am trying to automatically transcribe documents such 
as the one attached (a simple excerpt from a longer file; see also e.g. 
https://royalsocietypublishing.org/doi/epdf/10.1098/rstl.1720.0013). *Any 
idea if there are more suitable models for this kind of 18th-century 
documents? *(Seems like a 18th-century Caslon font, which uses the long S 
<https://en.wikipedia.org/wiki/Long_s> quite often)


You might want to look at some of the work that was done by the Early 
Modern OCR project: https://emop.tamu.edu/

Tom 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/3d81a21c-1245-4e6e-9dc9-c8ca02a10a2cn%40googlegroups.com.

[tesseract-ocr] Re: Using tesseract_best (or other models?) for 18th-century English printed text

Reply via email to