Hi there, I've been circling a problem with OCR'ing 90-pages of 30 year old BASIC code. I've been working on optimizing my scanning settings, and pre-processing, stuck in photoshop for hours messing around. Long couple days with this stuff!
I've been through tessdoc, through the FAQ, through wikipedia reading about morphological operators. Through PPAs for 5.0.0-alpha-833-ga06c. I'm getting OK results so far, but need to process more images, my workflow is tedious. Sample image here https://www.techtravels.org/wp-content/uploads/2020/12/FNBBS-02_crop.png 150dpi image extracted via pdftoppm -png from a 1200dpi scan. While it's not super clear to me why, higher res scans are resulting in WORSE OCR's. *TLDR; What should be the ideal configuration of tesseract for my application? Disable the dictionary? Can I add BASIC commands and keywords to eng.user-words? From the manual "CONFIG FILES AND AUGMENTING WITH USER DATA" section ??* I could use some help, thanks! Keith -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d17a46e5-ca36-493b-9eee-078266bfb116n%40googlegroups.com.