see
http://www.devscope.net/Content/ocrchecks.aspx
https://github.com/BigPino67/Tesseract-MICR-OCR
https://groups.google.com/d/msg/tesseract-ocr/obWI4cz8rXg/6l82hEySgOgJ

On Mon, Jun 10, 2019 at 11:21 AM ElGato ElMago <elmagoelg...@gmail.com>
wrote:

> That'll be nice if there's traineddata out there but I didn't find any.  I
> see free fonts and commercial OCR software but not traineddata.  Tessdata
> repository obviously doesn't have one, either.
>
> 2019年6月8日土曜日 1時52分10秒 UTC+9 shree:
>>
>> Please also search for existing MICR traineddata files.
>>
>> On Thu, Jun 6, 2019 at 1:09 PM ElGato ElMago <elmago...@gmail.com> wrote:
>>
>>> So I did several tests from scratch.  In the last attempt, I made a
>>> training text with 4,000 lines in the following format,
>>>
>>> 110004310510<   <02 :4002=0181:801= 0008752 <00039 ;0000001000;
>>>
>>>
>>> and combined it with eng.digits.training_text in which symbols are
>>> converted to E13B symbols.  This makes about 12,000 lines of training
>>> text.  It's amazing that this thing generates a good reader out of
>>> nowhere.  But then it is not very good.  For example:
>>>
>>> <01 :1901=1386:021= 1111001<10001< ;0000090134;
>>>
>>> is a result on the image attached.  It's close but the last '<' in the
>>> result text doesn't exist on the image.  It's a small failure but it causes
>>> a greater trouble in parsing.
>>>
>>> What would you suggest from here to increase accuracy?
>>>
>>>    - Increase the number of lines in the training text
>>>    - Mix up more variations in the training text
>>>    - Increase the number of iterations
>>>    - Investigate wrong reads one by one
>>>    - Or else?
>>>
>>> Also, I referred to engrestrict*.* and could generate similar result
>>> with the fine-tuning-from-full method.  It seems a bit faster to get to the
>>> same level but it also stops at a 'good' level.  I can go with either way
>>> if it takes me to the bright future.
>>>
>>> Regards,
>>> ElMagoElGato
>>>
>>> 2019年5月30日木曜日 15時56分02秒 UTC+9 ElGato ElMago:
>>>>
>>>> Thanks a lot, Shree. I'll look it in.
>>>>
>>>> 2019年5月30日木曜日 14時39分52秒 UTC+9 shree:
>>>>>
>>>>> See https://github.com/Shreeshrii/tessdata_shreetest
>>>>>
>>>>> Look at the files engrestrict*.* and also
>>>>> https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text
>>>>>
>>>>> Create training text of about 100 lines and finetune for 400 lines
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 30, 2019 at 9:38 AM ElGato ElMago <elmago...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I had about 14 lines as attached.  How many lines would you recommend?
>>>>>>
>>>>>> Fine tuning gives much better result but it tends to pick other
>>>>>> character than in E13B that only has 14 characters, 0 through 9 and 4
>>>>>> symbols.  I thought training from scratch would eliminate such confusion.
>>>>>>
>>>>>> 2019年5月30日木曜日 10時43分08秒 UTC+9 shree:
>>>>>>>
>>>>>>> For training from scratch a large training text and hundreds of
>>>>>>> thousands of iterations are recommended.
>>>>>>>
>>>>>>> If you are just fine tuning for a font try to follow instructions
>>>>>>> for training for impact, with your font.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 30 May 2019, 06:05 ElGato ElMago, <elmago...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks, Shree.
>>>>>>>>
>>>>>>>> Yes, I saw the instruction.  The steps I made are as follows:
>>>>>>>>
>>>>>>>> Using tesstrain.sh:
>>>>>>>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng
>>>>>>>> --linedata_only \
>>>>>>>>   --noextract_font_properties --langdata_dir ../langdata \
>>>>>>>>   --tessdata_dir ./tessdata \
>>>>>>>>   --fontlist "E13Bnsd" --output_dir ~/tesstutorial/e13beval \
>>>>>>>>   --training_text ../langdata/eng/eng.training_e13b_text
>>>>>>>>
>>>>>>>> Training from scratch:
>>>>>>>> mkdir -p ~/tesstutorial/e13boutput
>>>>>>>> src/training/lstmtraining --debug_interval 100 \
>>>>>>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>>>>>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256
>>>>>>>> O1c111]' \
>>>>>>>>   --model_output ~/tesstutorial/e13boutput/base --learning_rate
>>>>>>>> 20e-4 \
>>>>>>>>   --train_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>>>>>>>>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt \
>>>>>>>>   --max_iterations 5000 &>~/tesstutorial/e13boutput/basetrain.log
>>>>>>>>
>>>>>>>> Test with base_checkpoint:
>>>>>>>> src/training/lstmeval --model
>>>>>>>> ~/tesstutorial/e13boutput/base_checkpoint \
>>>>>>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>>>>>   --eval_listfile ~/tesstutorial/e13beval/eng.training_files.txt
>>>>>>>>
>>>>>>>> Combining output files:
>>>>>>>> src/training/lstmtraining --stop_training \
>>>>>>>>   --continue_from ~/tesstutorial/e13boutput/base_checkpoint \
>>>>>>>>   --traineddata ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>>>>>   --model_output ~/tesstutorial/e13boutput/eng.traineddata
>>>>>>>>
>>>>>>>> Test with eng.traineddata:
>>>>>>>> tesseract e13b.png out --tessdata-dir
>>>>>>>> /home/koichi/tesstutorial/e13boutput
>>>>>>>>
>>>>>>>>
>>>>>>>> The training from scratch ended as:
>>>>>>>>
>>>>>>>> At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, char
>>>>>>>> train=0%, word train=0%, skip ratio=0%,  New best char error = 0 wrote 
>>>>>>>> best
>>>>>>>> model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint wrote
>>>>>>>> checkpoint.
>>>>>>>>
>>>>>>>>
>>>>>>>> The test with base_checkpoint returns nothing as:
>>>>>>>>
>>>>>>>> At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0
>>>>>>>>
>>>>>>>>
>>>>>>>> The test with eng.traineddata and e13b.png returns out.txt.  Both
>>>>>>>> files are attached.
>>>>>>>>
>>>>>>>> Training seems to have worked fine.  I don't know how to translate
>>>>>>>> the test result from base_checkpoint.  The generated eng.traineddata
>>>>>>>> obviously doesn't work well. I suspect the choice of --traineddata in
>>>>>>>> combining output files is bad but I have no clue.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> ElMagoElGato
>>>>>>>>
>>>>>>>> BTW, I referred to your tess4training in the process.  It helped a
>>>>>>>> lot.
>>>>>>>>
>>>>>>>> 2019年5月29日水曜日 19時14分08秒 UTC+9 shree:
>>>>>>>>>
>>>>>>>>> see
>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files
>>>>>>>>>
>>>>>>>>> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <elmago...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I wish to make a trained data for E13B font.
>>>>>>>>>>
>>>>>>>>>> I read the training tutorial and made a base_checkpoint file
>>>>>>>>>> according to the method in Training From Scratch.  Now, how can I 
>>>>>>>>>> make a
>>>>>>>>>> trained data from the base_checkpoint file?
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to tesser...@googlegroups.com.
>>>>>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr
>>>>>>>>>> .
>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com
>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> ____________________________________________________________
>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to tesser...@googlegroups.com.
>>>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to tesser...@googlegroups.com.
>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/5b151e61-5b41-4191-8d26-784809ef8e10%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/5b151e61-5b41-4191-8d26-784809ef8e10%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/09d3119c-d093-4269-bf3a-3ddb467ed0ed%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/09d3119c-d093-4269-bf3a-3ddb467ed0ed%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWQY5i92PGxxqWbVH5N-bF9u%3Dmw5ZKe%3DQRCnQvftUjdbQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to