[tesseract-ocr] Issue with Fine-Tuning eng.traineddata on Large Dataset: Negative Mean RMS Error

2024-01-30 Thread Ilyas
Hello everyone, I've been successfully fine-tuning the eng.traineddata model with smaller datasets, but when I try to scale up to a larger dataset to include a more diverse range of documents, I encounter an unusual error. The training process starts, but it immediately reports a negative Mean

Re: [tesseract-ocr] Issue with Fine-Tuning eng.traineddata on Large Dataset: Negative Mean RMS Error

2024-01-30 Thread Ger Hobbelt
On Tue, 30 Jan 2024, 17:13 Ilyas, wrote: > > > The output I'm wondering about is : > At iteration 1/600/600, Mean rms=-2147483.6%, > I dont know why or what is causing this; I just notice the value is quite remarkable as it looks like INT32_MIN got fed into some promillage/percentage calculation

[tesseract-ocr] OCR of free hand photo of book

2024-01-30 Thread Borneq
First I test tesseract on file generated as flat image. I generate Lorem Ipsum text: 5 paragraphs, 452 words 2978 bytes, 24 lines + 4 blank lines, maximal line len in my editor was 135 chars. Result: 100% accurate but two full stop marks, fantastic. Next, I rotate image. Only 0.7 degree caused

Re: [tesseract-ocr] Re: I need help to develop image to text extraction

2024-01-30 Thread Santhiya C
Already i installed hte pytesseract but i got this error Usage: pytesseract [-l lang] input_file how do i fix this issue On Saturday 27 January 2024 at 14:08:01 UTC+5:30 zdenop wrote: > 👍 > > Zdenko > > > so 27. 1. 2024 o 2:22 Ger Hobbelt napísal(a): > >> L.S., >> >> *PDF. OCR. text extractio