Hi there,

I've been circling a problem with OCR'ing 90-pages of 30 year old BASIC 
code. I've been working on optimizing my scanning settings, and 
pre-processing, stuck in photoshop for hours messing around. Long couple 
days with this stuff!

I've been through tessdoc, through the FAQ, through wikipedia reading about 
morphological operators. Through PPAs for 5.0.0-alpha-833-ga06c.

I'm getting OK results so far, but need to process more images, my workflow 
is tedious.

Sample image here

150dpi image extracted via pdftoppm -png from a 1200dpi scan. While it's 
not super clear to me why, higher res scans are resulting in WORSE OCR's.

*TLDR; What should be the ideal configuration of tesseract for my 
application? Disable the dictionary? Can I add BASIC commands and keywords 
to eng.user-words? From the manual "CONFIG FILES AND AUGMENTING WITH USER 
DATA" section ??*

I could use some help, thanks!


You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Reply via email to