[tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

2020-12-13 Thread Keith M
Hi there, I've been circling a problem with OCR'ing 90-pages of 30 year old BASIC code. I've been working on optimizing my scanning settings, and pre-processing, stuck in photoshop for hours messing around. Long couple days with this stuff! I've been through tessdoc, through the FAQ, through w

[tesseract-ocr] Re: advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-01 Thread Keith M
> https://groups.google.com/g/tesseract-ocr/c/ApM_TqwV7aE/m/z5jZV0I0AgAJ > for link to a completed project for dot matrix > > On Monday, December 14, 2020 at 12:11:00 PM UTC+5:30 Keith M wrote: > >> Hi there, >> >> I've been circling a problem with OCR'ing

[tesseract-ocr] Re: advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-01 Thread Keith M
x27;m done), but I think it's neat, and I like learning about new technology. Hope the group finds this info useful. Thanks, Keith On Friday, January 1, 2021 at 11:32:40 PM UTC-5 Keith M wrote: > Ger, > > Thanks for taking the time to reply. > > On 1/1/2021 4:00 PM, Ge

Re: [tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-04 Thread Keith M
Hello again Alex, Thanks for the conversation. I have someone who has offered to modify a similar, but slightly different, font for me. This would potentially allow some optimization on recognition. For instance, Abbyy FineReader accepts a font file, and providing a matching one, it's suppose

Re: [tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-05 Thread Keith M
o your document? The standard Tesseract language models were trained on corpora (Wiki articles? not sure) which have a very different character frequency and pattern compared to BASIC programs. rgds, Ben On Monday, January 4, 2021 at 7:56:44 PM UTC-8 Keith M wrote: Hello again Alex

[tesseract-ocr] make training does nothing when run

2021-01-07 Thread Keith M
I'm sure I'm making a beginner mistake here, but I'm struggling quite a bit. I've built straight from source, both version 4.1.1 and 5.0.0 on Ubuntu 18.04, and Ubuntu 20.04(fresh install, never used, but properly updated). All exhibit the same behavior. I installed all the dependencies following

[tesseract-ocr] Re: Microscopy label, poor recognition

2021-12-21 Thread Keith M
Martin, I'd normally reply privately here, but I don't think that's an option given google groups configuration. I know you didn't ask this specifically, but I ran your sample image, unmodified, through AWS Textract, and got great results. I'm happy to run a small subset of images through it