Re: traineddata file size varies according to box file images?

zdenko podobny Fri, 28 Feb 2014 04:28:20 -0800

wildcard (*.tr) is shell/OS issue (see e.g. Windows[1]) - so support of
this feature depends on shell and not tesseract.


[1]
http://superuser.com/questions/460598/is-there-any-way-to-get-the-windows-cmd-shell-to-expand-wildcard-paths

Zdenko


On Fri, Feb 28, 2014 at 12:58 PM, Frederico Ferro Schuh <
[email protected]> wrote:

> Thanks for the reply Bernard.
> It's good to know that my traineddata size is normal. I will now focus on
> improving my samples, hopefully I can improve the performance. Seems like a
> case of overtraining.
>
> The *.tr tip is a gem, really appreciate it :)
>
> Thanks again!
> Fred
>
>
> On Wednesday, February 26, 2014 8:19:28 PM UTC+8, Bernard Polarski wrote:
>>
>>
>> If you do not include a word-dawg, freq-dawg then the only big file is
>> inttemp.
>> For 34000 character I am surprised to see it at the size of around 100k.
>> However your 6000 represents only 10 digit so it is very possible.
>> As of the poor performance, I think that the size is very detrimental :
>> the character are usually 20 to 40 pixel high and 20 to 50 wide ( only for
>> 'm' or 'w' )
>> Too much precision is not good.
>>
>> All he others files are usually rather small (pffmtable, normproto,
>> font_properties. shapetable, unicharset, unicharambigs)
>> and combined are less than 100k.
>>
>> In this respect your traineddata seems normal.
>>
>> Beside that you could write using wildcard:
>>
>>    shapeclustering *.tr
>>    mftraining *.tr
>>    cntraining*.tr
>>
>>
>> Le mardi 25 février 2014 17:51:39 UTC+1, Frederico Ferro Schuh a écrit :
>>
>>> Hello all,
>>>
>>> I'm training Tesseract to recognize handwritten digits, and I have
>>> provided it about 6000 samples of each digit, in 10 different box files,
>>> one for each digit. Each box file is a 2152x2152 TIF file. However, the
>>> resulting traineddata file I get after completing the training procedure is
>>> only 137 kb.
>>> I went through the process again, providing smaller sample files (1000
>>> samples of each digit), and ended up with the same traineddata size of 137
>>> kb.
>>> Is this size reasonable or am I doing something wrong?
>>> I assume something is wrong because my results are pretty bad so far.
>>>
>>> I've attached the sample image I am using for the digit 0.
>>>
>>> Thanks in advance,
>>> Fred
>>>
>>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: traineddata file size varies according to box file images?

Reply via email to