Re: [tesseract-ocr] How to continue training with Makefile + EPOCHS

2023-12-05 Thread Keith Smith
>From one novice to another ... 1. Yes, that is my understanding of how to run further iterations. 2. Yes, EPOCHS says to iterate that many times over your set of tests. I think I have heard the recommended number of EPOCHS in general is 2, though I don't know how much science is behind that. I

Re: [tesseract-ocr] Any success story?

2023-11-14 Thread Keith Smith
The short answer is "no", but a fuller answer is that my use case is a bit different from others and is as follows ... I trained tesseract to read the MICR line at the bottom of bank checks using only 20K checks (i.e. real data, not synthetic). I was able to get 85% accuracy where the reason for

Re: [tesseract-ocr] LSTM-based training produces .box files with the same coordinates

2023-11-01 Thread Keith Smith
fyi, I asked the same question in https://groups.google.com/g/tesseract-ocr/c/9myrnSD0HKM On Wednesday, November 1, 2023 at 7:21:37 AM UTC-4 zdenop wrote: > Are you following official tutorials? > Did you read the documentation? > Have you tried to check the official training repository and pro

Re: Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-23 Thread Keith Smith
@googlegroups.com Subject: [EXTERNAL] Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks CAUTION EXTERNAL EMAIL DO NOT open attachments or click on links from unknown senders or unexpected emails. Hi Keith, The foo.traindedata is not existing but do you mean : the

Re: [tesseract-ocr] Lessons, best practices, recommendations, strategies, hacks

2023-10-21 Thread Keith Smith
contents of the google doc could be submitted as a PR to the tesstrain repo. Again, just a suggestion that I hope would be helpful to all. Thanks, Keith On Sat, Oct 21, 2023 at 8:28 AM Des Bw wrote: > There is no exhaustive user manual for training tesseract. We all start in > the darknes

[tesseract-ocr] Error using tesstrain with START_MODEL - failed to continue

2023-10-19 Thread Keith Smith
as the START_MODEL for my training, but I am hitting the error I mentioned above. Is my approach incorrect? If yes, can you please direct me? I am not finding the documentation extremely clear, so I obviously may be doing something stupid. Thanks much for the help, Keith BTW, I am attaching

[tesseract-ocr] tesstrain help needed - failed to continue

2023-10-19 Thread Keith Smith
ust unzip it in the tesstrain directory and run the 3 commands mentioned above. Thanks, Keith -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email

Re: [tesseract-ocr] How to generate training images with noise

2023-10-18 Thread Keith Smith
ooking at https://github.com/tesseract-ocr/tesstrain/blob/main/generate_line_box.py#L26 Shouldn't the box file coordinates be different for each character? Thanks, Keith On Fri, Oct 13, 2023 at 10:59 AM Keith Smith wrote: > Thanks Shree for the clarification. I'll give it a t

Re: [tesseract-ocr] How to generate training images with noise

2023-10-13 Thread Keith Smith
/tesseract-ocr/tesstrain/wiki > > It has details about training using the makefile. > > On Fri, Oct 13, 2023, 3:43 PM Keith Smith wrote: > >> Yes I have. I am asking about how to automate the generation of the >> ground truth images and box files, because from what I understand,

Re: [tesseract-ocr] How to generate training images with noise

2023-10-13 Thread Keith Smith
/tesstrain assumes the ground truth (images + box files) already exist. On Fri, Oct 13, 2023 at 1:00 AM Shree Devi Kumar wrote: > Have you looked at > > https://github.com/tesseract-ocr/tesstrain > > > > On Thu, Oct 12, 2023, 11:45 PM Keith Smith > wrote: > >>

[tesseract-ocr] How to generate training images with noise

2023-10-12 Thread Keith Smith
best methodology to use? Is there a way to get text2image (or another tool) to generate less-than-perfect images? Or can someone suggest a less labor intensive way of using real check images to train tesseract? Thanks in advance, Keith -- You received this message because you are subscribed t

[tesseract-ocr] OCR various fields of bank check in TIFF format

2023-08-08 Thread Keith Smith
andard way of converting the "legalAmount" to a numeric value? 3. The results that I am getting for the MICR line fields are horrible. What is recommended for best results? These checks are E13B format. 4. If I need to do my own training, what is the best way to create the ground truth

[tesseract-ocr] Re: Microscopy label, poor recognition

2021-12-21 Thread Keith M
bset of images through it if you have a wide range of inputs, quality of images, etc. Please contact me off-list keith a_ t_ techtravels dot org. Thanks, Keith On Tuesday, December 21, 2021 at 5:08:21 AM UTC-5 mrw...@googlemail.com wrote: > I have an image (label of a microscopy slid

Re: [tesseract-ocr] make training does nothing when run

2021-01-08 Thread Keith
o support training, but no mention about actually installing it. Do you think that's worthy of filing an issue? I'm probably not the only bonehead out there. Thanks, Keith On Fri, Jan 8, 2021 at 3:12 AM Shree Devi Kumar wrote: > >After placing the groundtruth files in a folder cal

[tesseract-ocr] make training does nothing when run

2021-01-07 Thread Keith M
er, I tried putting it in /usr/local/share/tessdata. eng.trainneddata has been copied to the tessdata folder. There's something obvious I'm doing wrong, but heck if I can find it. Help!@# Keith -- You received this message because you are subscribed to the Google Groups

Re: [tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-05 Thread Keith M
quick scan. I must admit this is a pretty cool problem space. Thanks, Keith On 1/5/2021 12:28 PM, Ben Bongalon wrote: Hi Keith, Interesting project. Having looked at the sample OCR results that Alex posted, I think the poor recognition from Tesseract is more likely due to the underlying

Re: [tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-04 Thread Keith M
to send it there. But would they want it? I will type up a blog post detailing some of this, because there's no sense in NOT writing this down after all the research. Thanks, Keith P.S. Yes, simply typing the 100 page document in, or paying someone to do so would be faster and cheaper. B

[tesseract-ocr] Re: advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-01 Thread Keith M
x27;m done), but I think it's neat, and I like learning about new technology. Hope the group finds this info useful. Thanks, Keith On Friday, January 1, 2021 at 11:32:40 PM UTC-5 Keith M wrote: > Ger, > > Thanks for taking the time to reply. > > On 1/1/2021 4:00 PM, Ge

[tesseract-ocr] Re: advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-01 Thread Keith M
ally it should help, practically speaking I saw only minimal improvement. While it's still a work in progress, I'm describing my current best efforts/results in the other reply here. Thanks, Keith On Friday, January 1, 2021 at 10:03:37 PM UTC-5 shree wrote: > Please see old thread at

[tesseract-ocr] advice for OCR'ing 9-pin dot matrix BASIC code

2020-12-13 Thread Keith M
to eng.user-words? From the manual "CONFIG FILES AND AUGMENTING WITH USER DATA" section ??* I could use some help, thanks! Keith -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop re

Re: [tesseract-ocr] Extract Graphics from Video and get text with OCR

2015-09-21 Thread Keith Reilly
hree thanks for pointing out the whitelist. I didn't know that existed, i'm sure my results will get better once i get it to work. Keith On Wednesday, September 16, 2015 at 12:13:38 AM UTC-4, Dmitri Silaev wrote: > > Text color - somehow you need to replicate or take into account the

Re: [tesseract-ocr] Extract Graphics from Video and get text with OCR

2015-09-15 Thread Keith Reilly
bly wouldn't need training. > > Completely different approach is to use fixed pattern matching. Go find my > post about pulling text out of game screenshots. You'll need to program > yourself then. > > The last thing I'd try is training. Wiki is your friend. &

Project starting in NH, need assistance

2013-01-30 Thread Keith B
I'm working on a project where we need someone local to New Hampshire with Tesseract experience. Any ideas where to find someone? -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegrou