[tesseract-ocr] Re: advice for OCR'ing 9-pin dot matrix BASIC code

shree Fri, 01 Jan 2021 19:03:43 -0800

Please see old thread 
at https://groups.google.com/g/tesseract-ocr/c/ApM_TqwV7aE/m/z5jZV0I0AgAJ 
for link to a completed project for dot matrix

On Monday, December 14, 2020 at 12:11:00 PM UTC+5:30 Keith M wrote:

> Hi there,
>
> I've been circling a problem with OCR'ing 90-pages of 30 year old BASIC 
> code. I've been working on optimizing my scanning settings, and 
> pre-processing, stuck in photoshop for hours messing around. Long couple 
> days with this stuff!
>
> I've been through tessdoc, through the FAQ, through wikipedia reading 
> about morphological operators. Through PPAs for 5.0.0-alpha-833-ga06c.
>
> I'm getting OK results so far, but need to process more images, my 
> workflow is tedious.
>
> Sample image here
> https://www.techtravels.org/wp-content/uploads/2020/12/FNBBS-02_crop.png
>
> 150dpi image extracted via pdftoppm -png from a 1200dpi scan. While it's 
> not super clear to me why, higher res scans are resulting in WORSE OCR's.
>
> *TLDR; What should be the ideal configuration of tesseract for my 
> application? Disable the dictionary? Can I add BASIC commands and keywords 
> to eng.user-words? From the manual "CONFIG FILES AND AUGMENTING WITH USER 
> DATA" section ??*
>
> I could use some help, thanks!
>
> Keith
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/77160012-b59e-4a35-8d5b-d4b5b902cf4cn%40googlegroups.com.

[tesseract-ocr] Re: advice for OCR'ing 9-pin dot matrix BASIC code

Reply via email to