wildcard (*.tr) is shell/OS issue (see e.g. Windows[1]) - so support of this feature depends on shell and not tesseract.
[1] http://superuser.com/questions/460598/is-there-any-way-to-get-the-windows-cmd-shell-to-expand-wildcard-paths Zdenko On Fri, Feb 28, 2014 at 12:58 PM, Frederico Ferro Schuh < [email protected]> wrote: > Thanks for the reply Bernard. > It's good to know that my traineddata size is normal. I will now focus on > improving my samples, hopefully I can improve the performance. Seems like a > case of overtraining. > > The *.tr tip is a gem, really appreciate it :) > > Thanks again! > Fred > > > On Wednesday, February 26, 2014 8:19:28 PM UTC+8, Bernard Polarski wrote: >> >> >> If you do not include a word-dawg, freq-dawg then the only big file is >> inttemp. >> For 34000 character I am surprised to see it at the size of around 100k. >> However your 6000 represents only 10 digit so it is very possible. >> As of the poor performance, I think that the size is very detrimental : >> the character are usually 20 to 40 pixel high and 20 to 50 wide ( only for >> 'm' or 'w' ) >> Too much precision is not good. >> >> All he others files are usually rather small (pffmtable, normproto, >> font_properties. shapetable, unicharset, unicharambigs) >> and combined are less than 100k. >> >> In this respect your traineddata seems normal. >> >> Beside that you could write using wildcard: >> >> shapeclustering *.tr >> mftraining *.tr >> cntraining*.tr >> >> >> Le mardi 25 février 2014 17:51:39 UTC+1, Frederico Ferro Schuh a écrit : >> >>> Hello all, >>> >>> I'm training Tesseract to recognize handwritten digits, and I have >>> provided it about 6000 samples of each digit, in 10 different box files, >>> one for each digit. Each box file is a 2152x2152 TIF file. However, the >>> resulting traineddata file I get after completing the training procedure is >>> only 137 kb. >>> I went through the process again, providing smaller sample files (1000 >>> samples of each digit), and ended up with the same traineddata size of 137 >>> kb. >>> Is this size reasonable or am I doing something wrong? >>> I assume something is wrong because my results are pretty bad so far. >>> >>> I've attached the sample image I am using for the digit 0. >>> >>> Thanks in advance, >>> Fred >>> >> -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

