Re: [tesseract-ocr] Re: Tesseract training ground truth: I'm confused about the box files

2024-09-05 Thread Mateusz Matela
nop wrote: Ehm: 1. Tesseract v3 (legacy) engine training is based on characters. 2. Tesseract LSTM engine (tesseract >=v4) training script is based on lines (group of words) Box files reflect that. And yes - box files are important. Zdenko pi 12. 7. 2024 o 14:14 Mateusz Matela napís

[tesseract-ocr] Re: Tesseract training ground truth: I'm confused about the box files

2024-07-12 Thread Mateusz Matela
file and let the training script autogenerate them. In that case the reported error rates were crazy, like 99% instead of 0.5%. This suggests that conclusion 3 is correct. środa, 10 lipca 2024 o 15:17:07 UTC+2 Mateusz Matela napisał(a): > Hi all, > > Sorry if double posting, my previou

[tesseract-ocr] Tesseract training ground truth: I'm confused about the box files

2024-07-10 Thread Mateusz Matela
Hi all, Sorry if double posting, my previous message didn't appear and I don't see any info about waiting for acceptance or something. I was searching for this topic in this forum and it was mentioned a few times, but I couldn't find a clear and definitive explanation. How does the information