Thanks for the reply Bernard.
It's good to know that my traineddata size is normal. I will now focus on 
improving my samples, hopefully I can improve the performance. Seems like a 
case of overtraining.

The *.tr tip is a gem, really appreciate it :)

Thanks again!
Fred

On Wednesday, February 26, 2014 8:19:28 PM UTC+8, Bernard Polarski wrote:
>
>  
> If you do not include a word-dawg, freq-dawg then the only big file is 
> inttemp. 
> For 34000 character I am surprised to see it at the size of around 100k.
> However your 6000 represents only 10 digit so it is very possible.
> As of the poor performance, I think that the size is very detrimental : 
> the character are usually 20 to 40 pixel high and 20 to 50 wide ( only for 
> 'm' or 'w' ) 
> Too much precision is not good.
>  
> All he others files are usually rather small (pffmtable, normproto, 
> font_properties. shapetable, unicharset, unicharambigs)
> and combined are less than 100k.
>  
> In this respect your traineddata seems normal.
>  
> Beside that you could write using wildcard:
>  
>    shapeclustering *.tr
>    mftraining *.tr
>    cntraining*.tr
>  
>  
> Le mardi 25 février 2014 17:51:39 UTC+1, Frederico Ferro Schuh a écrit :
>
>> Hello all, 
>>
>> I'm training Tesseract to recognize handwritten digits, and I have 
>> provided it about 6000 samples of each digit, in 10 different box files, 
>> one for each digit. Each box file is a 2152x2152 TIF file. However, the 
>> resulting traineddata file I get after completing the training procedure is 
>> only 137 kb.
>> I went through the process again, providing smaller sample files (1000 
>> samples of each digit), and ended up with the same traineddata size of 137 
>> kb.
>> Is this size reasonable or am I doing something wrong?
>> I assume something is wrong because my results are pretty bad so far.
>>
>> I've attached the sample image I am using for the digit 0.
>>
>> Thanks in advance,
>> Fred
>>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to