Do you think training one character per file is affecting my results?

I was doing it because I have thousands of samples, and makebox always 
makes too many wrong guesses. If I have all the digits on the same image, 
fixing the resulting 10k chars box file manually would take forever. On the 
other hand, fixing a single digit box file only takes a simple regexp 
replace operation on the resulting box file (one replace for digit 1, 
another replace for digit 2, and so on). 

Also, the goal of my application is for online OCR, to recognize single 
lines of handwritten digits as the user draws them. Would this affect the 
format of my sample image(s) as well?

Thanks,
Fred


On Friday, February 28, 2014 10:58:05 PM UTC+8, Quan Nguyen wrote:
>
> I'm not sure having only samples of one character in a file is a good 
> idea. I normally train with all the characters in the same image(s).
>
> Check 
> http://code.google.com/p/tesseract-ocr/downloads/detail?name=boxtiff-2.01.eng.tar.gzfor
>  an example.
>
> On Tuesday, February 25, 2014 10:51:39 AM UTC-6, Frederico Ferro Schuh 
> wrote:
>>
>> Hello all,
>>
>> I'm training Tesseract to recognize handwritten digits, and I have 
>> provided it about 6000 samples of each digit, in 10 different box files, 
>> one for each digit. Each box file is a 2152x2152 TIF file. However, the 
>> resulting traineddata file I get after completing the training procedure is 
>> only 137 kb.
>> I went through the process again, providing smaller sample files (1000 
>> samples of each digit), and ended up with the same traineddata size of 137 
>> kb.
>> Is this size reasonable or am I doing something wrong?
>> I assume something is wrong because my results are pretty bad so far.
>>
>> I've attached the sample image I am using for the digit 0.
>>
>> Thanks in advance,
>> Fred
>>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to