[tesseract-ocr] Swedish and Danish Fraktur

Lars Aronsson Mon, 26 Dec 2022 03:58:50 -0800

I run a standard Ubuntu Linux and now this means Tesseract 5.1.0.


Back under Tesseract 4.0 and 4.1, there were two separate files
swe-frak.traineddata and dan_frak.traineddata that one could
download and get very good results with old Swedish and Danish
book pages in Gothic/Fraktur typesetting. When I upgraded Ubuntu
Linux and got Tesseract 5.1, these were missing.

When I copy the old file from
/usr/share/tesseract-ocr/4.00/tessdata/dan_frak.traineddata
to
/usr/share/tesseract-ocr/5/tessdata/dan_frak.traineddata
It doesn't produce very useful results. There are lots of errors,
far more than expected.

What to do? Should those, who made those files, make new versions
that will work with the new Tesseract? Or will Tesseract finally
incorporate Fraktur reading without the need to load separate
training files?


--
  Lars Aronsson ([email protected])
  Project Runeberg - free Nordic literature - http://runeberg.org/


--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/859aba88-e90c-e78b-19c1-1d7a851bce95%40aronsson.se.

[tesseract-ocr] Swedish and Danish Fraktur

Reply via email to