Hi Tom,

I appreciate you suggestions. I hadn't considered the possibility of bad 
data in the larger dataset contributing to this error. I'll start with that 
to identify if any specific subset is causing the problem. I'll report back 
with what I find out from these tests.

Thank you for pointing me in this direction.

Best,
Ilyas
Le mercredi 31 janvier 2024 à 18:59:03 UTC+1, tfmo...@gmail.com a écrit :

> On Tuesday, January 30, 2024 at 11:13:06 AM UTC-5 Ilyas wrote:
>
>
> The output I'm wondering about is :
> At iteration 1/600/600, Mean rms=-2147483.6%, delta=0.033%, char 
> train=275.696%, word train=100%, skip ratio=0%,  New worst char error = 
> 275.696 wrote checkpoint.
>
> I expected the training process to proceed normally with the Mean RMS 
> error showing sensible values, similar to when training on smaller 
> datasets. When I use around 100k lstmf files it doesn't have this behaviour 
> but with 400k this happens.
>
> Am I looking in the wrong direction or missing something ?
>
>
> As Ger pointed out, the underflow is likely the symptom of a bug, but no 
> one is likely to be able to help much without a much smaller reproducer.
>
> The first thing I'd try would be to eliminate possible bad data in the 
> 300K new files as a source of the error. Can you run 100K chunks of the 
> added files separately without any error?
>
> If that works, I'd try to figure out the upper limit that works - 200K? 
> 300K? 350K? Perhaps you'll find an upper bound that's high enough for your 
> use case and you can avoid the hard work of tracking down the bug.
>
> There's unlikely to be any easy way to figure out what's going on.
>
> Tom
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4c9f6135-14c0-4cdc-afaa-154d4df4248en%40googlegroups.com.

Reply via email to