[tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

Keith M Sun, 13 Dec 2020 22:40:55 -0800

Hi there,

I've been circling a problem with OCR'ing 90-pages of 30 year old BASIC 
code. I've been working on optimizing my scanning settings, and 
pre-processing, stuck in photoshop for hours messing around. Long couple 
days with this stuff!


I've been through tessdoc, through the FAQ, through wikipedia reading about 
morphological operators. Through PPAs for 5.0.0-alpha-833-ga06c.

I'm getting OK results so far, but need to process more images, my workflow 
is tedious.

Sample image here
https://www.techtravels.org/wp-content/uploads/2020/12/FNBBS-02_crop.png

150dpi image extracted via pdftoppm -png from a 1200dpi scan. While it's 
not super clear to me why, higher res scans are resulting in WORSE OCR's.

*TLDR; What should be the ideal configuration of tesseract for my 
application? Disable the dictionary? Can I add BASIC commands and keywords 
to eng.user-words? From the manual "CONFIG FILES AND AUGMENTING WITH USER 
DATA" section ??*

I could use some help, thanks!

Keith

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d17a46e5-ca36-493b-9eee-078266bfb116n%40googlegroups.com.

[tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

Reply via email to