Re: [tesseract-ocr] Trained data for E13B font

ElGato ElMago Tue, 23 Jul 2019 01:10:50 -0700

It's great! Perfect!  Thanks a lot!

2019年7月23日火曜日 10時56分58秒 UTC+9 shree:
>
> See https://github.com/tesseract-ocr/tesseract/issues/2580
>
> On Tue, 23 Jul 2019, 06:23 ElGato ElMago, <elmago...@gmail.com 
> <javascript:>> wrote:
>
>> Hi,
>>
>> I read the output of hocr with lstm_choice_mode = 4 as to the pull 
>> request 2554.  It shows the candidates for each character but doesn't show 
>> bounding box of each character.  I only shows the box for a whole word.
>>
>> I see bounding boxes of each character in comments of the pull request 
>> 2576.  How can I do that?  Do I have to look in the source code and 
>> manipulate such an output on my own?
>>
>> 2019年7月19日金曜日 18時40分49秒 UTC+9 ElGato ElMago:
>>
>>> Lorenzo,
>>>
>>> I haven't been checking psm too much.  Will turn to those options after 
>>> I see how it goes with bounding boxes.
>>>
>>> Shree,
>>>
>>> I see the merges in the git log and also see that new 
>>> option lstm_choice_amount works now.  I guess my executable is latest 
>>> though I still see the phantom character.  Hocr makes huge and complex 
>>> output.  I'll take some to read it.
>>>
>>> 2019年7月19日金曜日 18時20分55秒 UTC+9 Claudiu:
>>>>
>>>> Is there any way to pass bounding boxes to use to the LSTM? We have an 
>>>> algorithm that cleanly gets bounding boxes of MRZ characters. However the 
>>>> results using psm 10 are worse than passing the whole line in. Yet when we 
>>>> pass the whole line in we get these phantom characters. 
>>>>
>>>> Should PSM 10 mode work? It often returns “no character” where there 
>>>> clearly is one. I can supply a test case if it is expected to work well. 
>>>>
>>>> On Fri, Jul 19, 2019 at 11:06 AM ElGato ElMago <elmago...@gmail.com> 
>>>> wrote:
>>>>
>>>>> Lorenzo,
>>>>>
>>>>> We both have got the same case.  It seems a solution to this problem 
>>>>> would save a lot of people.
>>>>>
>>>>> Shree,
>>>>>
>>>>> I pulled the current head of master branch but it doesn't seem to 
>>>>> contain the merges you pointed that have been merged 3 to 4 days ago.  
>>>>> How 
>>>>> can I get them?
>>>>>
>>>>> ElMagoElGato
>>>>>
>>>>> 2019年7月19日金曜日 17時02分53秒 UTC+9 Lorenzo Blz:
>>>>>>
>>>>>>
>>>>>>
>>>>>> PSM 7 was a partial solution for my specific case, it improved the 
>>>>>> situation but did not solve it. Also I could not use it in some other 
>>>>>> cases.
>>>>>>
>>>>>> The proper solution is very likely doing more training with more 
>>>>>> data, some data augmentation might probably help if data is scarce.
>>>>>> Also doing less training might help is the training is not done 
>>>>>> correctly.
>>>>>>
>>>>>> There are also similar issues on github:
>>>>>>
>>>>>> https://github.com/tesseract-ocr/tesseract/issues/1465
>>>>>> ...
>>>>>>
>>>>>> The LSTM engine works like this: it scans the image and for each 
>>>>>> "pixel column" does this:
>>>>>>
>>>>>> M M M M N M M M [BLANK] F F F F
>>>>>>
>>>>>> (here i report only the highest probability characters)
>>>>>>
>>>>>> In the example above an M is partially seen as an N, this is normal, 
>>>>>> and another step of the algorithm (beam search I think) tries to 
>>>>>> aggregate 
>>>>>> back the correct characters.
>>>>>>
>>>>>> I think cases like this:
>>>>>>
>>>>>> M M M N N N M M
>>>>>>
>>>>>> are what gives the phantom characters. More training should reduce 
>>>>>> the source of the problem or a painful analysis of the bounding boxes 
>>>>>> might 
>>>>>> fix some cases.
>>>>>>
>>>>>>
>>>>>> I used the attached script for the boxes.
>>>>>>
>>>>>>
>>>>>> Lorenzo
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Il giorno ven 19 lug 2019 alle ore 07:25 ElGato ElMago <
>>>>>> elmago...@gmail.com> ha scritto:
>>>>>>
>>>>> Hi,
>>>>>>>
>>>>>>> Let's call them phantom characters then.
>>>>>>>
>>>>>>> Was psm 7 the solution for the issue 1778?  None of the psm option 
>>>>>>> didn't solve my problem though I see different output.
>>>>>>>
>>>>>>> I use tesseract 5.0-alpha mostly but 4.1 showed the same results 
>>>>>>> anyway.  How did you get bounding box for each character?  Alto and 
>>>>>>> lstmbox 
>>>>>>> only show bbox for a group of characters.
>>>>>>>
>>>>>>> ElMagoElGato
>>>>>>>
>>>>>>> 2019年7月17日水曜日 18時58分31秒 UTC+9 Lorenzo Blz:
>>>>>>>
>>>>>>>> Phantom characters here for me too:
>>>>>>>>
>>>>>>>> https://github.com/tesseract-ocr/tesseract/issues/1778
>>>>>>>>
>>>>>>>> Are you using 4.1? Bounding boxes were fixed in 4.1 maybe this was 
>>>>>>>> also improved.
>>>>>>>>
>>>>>>>> I wrote some code that uses symbols iterator to discard symbols 
>>>>>>>> that are clearly duplicated: too small, overlapping, etc. But it was 
>>>>>>>> not 
>>>>>>>> easy to make it work decently and it is not 100% reliable with false 
>>>>>>>> negatives and positives. I cannot share the code and it is quite ugly 
>>>>>>>> anyway.
>>>>>>>>
>>>>>>>> Here there is another MRZ model with training data:
>>>>>>>>
>>>>>>>> https://github.com/DoubangoTelecom/tesseractMRZ
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Lorenzo
>>>>>>>>
>>>>>>>>
>>>>>>>> Il giorno mer 17 lug 2019 alle ore 11:26 Claudiu <csaf...@gmail.com> 
>>>>>>>> ha scritto:
>>>>>>>>
>>>>>>>>> I’m getting the “phantom character” issue as well using the OCRB 
>>>>>>>>> that Shree trained on MRZ lines. For example for a 0 it will 
>>>>>>>>> sometimes add 
>>>>>>>>> both a 0 and an O to the output , thus outputting 45 characters total 
>>>>>>>>> instead of 44. I haven’t looked at the bounding box output yet but I 
>>>>>>>>> suspect a phantom thin character is added somewhere that I can 
>>>>>>>>> discard .. 
>>>>>>>>> or maybe two chars will have the same bounding box. If anyone else 
>>>>>>>>> has 
>>>>>>>>> fixed this issue further up (eg so the output doesn’t contain the 
>>>>>>>>> phantom 
>>>>>>>>> characters in the first place) id be interested. 
>>>>>>>>>
>>>>>>>>> On Wed, Jul 17, 2019 at 10:01 AM ElGato ElMago <
>>>>>>>>> elmago...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'll go back to more of training later.  Before doing so, I'd 
>>>>>>>>>> like to investigate results a little bit.  The hocr and lstmbox 
>>>>>>>>>> options 
>>>>>>>>>> give some details of positions of characters.  The results show 
>>>>>>>>>> positions 
>>>>>>>>>> that perfectly correspond to letters in the image.  But the text 
>>>>>>>>>> output 
>>>>>>>>>> contains a character that obviously does not exist.
>>>>>>>>>>
>>>>>>>>>> Then I found a config file 'lstmdebug' that generates far more 
>>>>>>>>>> information.  I hope it explains what happened with each character.  
>>>>>>>>>> I'm 
>>>>>>>>>> yet to read the debug output but I'd appreciate it if someone could 
>>>>>>>>>> tell me 
>>>>>>>>>> how to read it because it's really complex.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> ElMagoElGato
>>>>>>>>>>
>>>>>>>>>> 2019年6月14日金曜日 19時58分49秒 UTC+9 shree:
>>>>>>>>>>
>>>>>>>>>>> See https://github.com/Shreeshrii/tessdata_MICR
>>>>>>>>>>>
>>>>>>>>>>> I have uploaded my files there. 
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/Shreeshrii/tessdata_MICR/blob/master/MICR.sh
>>>>>>>>>>> is the bash script that runs the training.
>>>>>>>>>>>
>>>>>>>>>>> You can modify as needed. Please note this is for legacy/base 
>>>>>>>>>>> tesseract --oem 0.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 14, 2019 at 1:26 PM ElGato ElMago <
>>>>>>>>>>> elmago...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks a lot, shree.  It seems you know everything.
>>>>>>>>>>>>
>>>>>>>>>>>> I tried the MICR0.traineddata and the first two 
>>>>>>>>>>>> mcr.traineddata.  The last one was blocked by the browser.  Each 
>>>>>>>>>>>> of the 
>>>>>>>>>>>> traineddata had mixed results.  All of them are getting symbols 
>>>>>>>>>>>> fairly good 
>>>>>>>>>>>> but getting spaces randomly and reading some numbers wrong.
>>>>>>>>>>>>
>>>>>>>>>>>> MICR0 seems the best among them.  Did you suggest that you'd be 
>>>>>>>>>>>> able to update it?  It gets tripple D very often where there's 
>>>>>>>>>>>> only one, 
>>>>>>>>>>>> and so on.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, I tried to fine tune from MICR0 but I found that I need 
>>>>>>>>>>>> to change the language-specific.sh.  It specifies some parameters 
>>>>>>>>>>>> for each 
>>>>>>>>>>>> language.  Do you have any guidance for it?
>>>>>>>>>>>>
>>>>>>>>>>>> 2019年6月14日金曜日 1時48分40秒 UTC+9 shree:
>>>>>>>>>>>>>
>>>>>>>>>>>>> see 
>>>>>>>>>>>>> http://www.devscope.net/Content/ocrchecks.aspx 
>>>>>>>>>>>>> https://github.com/BigPino67/Tesseract-MICR-OCR
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://groups.google.com/d/msg/tesseract-ocr/obWI4cz8rXg/6l82hEySgOgJ
>>>>>>>>>>>>>  
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 10, 2019 at 11:21 AM ElGato ElMago <
>>>>>>>>>>>>> elmago...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> That'll be nice if there's traineddata out there but I didn't 
>>>>>>>>>>>>>> find any.  I see free fonts and commercial OCR software but not 
>>>>>>>>>>>>>> traineddata.  Tessdata repository obviously doesn't have one, 
>>>>>>>>>>>>>> either.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2019年6月8日土曜日 1時52分10秒 UTC+9 shree:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please also search for existing MICR traineddata files.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jun 6, 2019 at 1:09 PM ElGato ElMago <
>>>>>>>>>>>>>>> elmago...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So I did several tests from scratch.  In the last attempt, 
>>>>>>>>>>>>>>>> I made a training text with 4,000 lines in the following 
>>>>>>>>>>>>>>>> format,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 110004310510<   <02 :4002=0181:801= 0008752 <00039 
>>>>>>>>>>>>>>>> ;0000001000;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> and combined it with eng.digits.training_text in which 
>>>>>>>>>>>>>>>> symbols are converted to E13B symbols.  This makes about 
>>>>>>>>>>>>>>>> 12,000 lines of 
>>>>>>>>>>>>>>>> training text.  It's amazing that this thing generates a good 
>>>>>>>>>>>>>>>> reader out of 
>>>>>>>>>>>>>>>> nowhere.  But then it is not very good.  For example:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <01 :1901=1386:021= 1111001<10001< ;0000090134;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is a result on the image attached.  It's close but the last 
>>>>>>>>>>>>>>>> '<' in the result text doesn't exist on the image.  It's a 
>>>>>>>>>>>>>>>> small failure 
>>>>>>>>>>>>>>>> but it causes a greater trouble in parsing.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What would you suggest from here to increase accuracy?  
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Increase the number of lines in the training text
>>>>>>>>>>>>>>>>    - Mix up more variations in the training text
>>>>>>>>>>>>>>>>    - Increase the number of iterations
>>>>>>>>>>>>>>>>    - Investigate wrong reads one by one
>>>>>>>>>>>>>>>>    - Or else?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, I referred to engrestrict*.* and could generate 
>>>>>>>>>>>>>>>> similar result with the fine-tuning-from-full method.  It 
>>>>>>>>>>>>>>>> seems a bit 
>>>>>>>>>>>>>>>> faster to get to the same level but it also stops at a 'good' 
>>>>>>>>>>>>>>>> level.  I can 
>>>>>>>>>>>>>>>> go with either way if it takes me to the bright future.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> ElMagoElGato
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2019年5月30日木曜日 15時56分02秒 UTC+9 ElGato ElMago:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks a lot, Shree. I'll look it in.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2019年5月30日木曜日 14時39分52秒 UTC+9 shree:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> See https://github.com/Shreeshrii/tessdata_shreetest
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Look at the files engrestrict*.* and also 
>>>>>>>>>>>>>>>>>> https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Create training text of about 100 lines and finetune for 
>>>>>>>>>>>>>>>>>> 400 lines 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 9:38 AM ElGato ElMago <
>>>>>>>>>>>>>>>>>> elmago...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I had about 14 lines as attached.  How many lines would 
>>>>>>>>>>>>>>>>>>> you recommend?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Fine tuning gives much better result but it tends to 
>>>>>>>>>>>>>>>>>>> pick other character than in E13B that only has 14 
>>>>>>>>>>>>>>>>>>> characters, 0 through 9 
>>>>>>>>>>>>>>>>>>> and 4 symbols.  I thought training from scratch would 
>>>>>>>>>>>>>>>>>>> eliminate such 
>>>>>>>>>>>>>>>>>>> confusion.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2019年5月30日木曜日 10時43分08秒 UTC+9 shree:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For training from scratch a large training text and 
>>>>>>>>>>>>>>>>>>>> hundreds of thousands of iterations are recommended. 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If you are just fine tuning for a font try to follow 
>>>>>>>>>>>>>>>>>>>> instructions for training for impact, with your font.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, 30 May 2019, 06:05 ElGato ElMago, <
>>>>>>>>>>>>>>>>>>>> elmago...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks, Shree.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes, I saw the instruction.  The steps I made are as 
>>>>>>>>>>>>>>>>>>>>> follows:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Using tesstrain.sh:
>>>>>>>>>>>>>>>>>>>>> src/training/tesstrain.sh --fonts_dir /usr/share/fonts 
>>>>>>>>>>>>>>>>>>>>> --lang eng --linedata_only \
>>>>>>>>>>>>>>>>>>>>>   --noextract_font_properties --langdata_dir 
>>>>>>>>>>>>>>>>>>>>> ../langdata \
>>>>>>>>>>>>>>>>>>>>>   --tessdata_dir ./tessdata \
>>>>>>>>>>>>>>>>>>>>>   --fontlist "E13Bnsd" --output_dir 
>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval \
>>>>>>>>>>>>>>>>>>>>>   --training_text 
>>>>>>>>>>>>>>>>>>>>> ../langdata/eng/eng.training_e13b_text
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Training from scratch:
>>>>>>>>>>>>>>>>>>>>> mkdir -p ~/tesstutorial/e13boutput
>>>>>>>>>>>>>>>>>>>>> src/training/lstmtraining --debug_interval 100 \
>>>>>>>>>>>>>>>>>>>>>   --traineddata 
>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>>>>>>>>>>>>>>>>>>   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 
>>>>>>>>>>>>>>>>>>>>> Lrx96 Lfx256 O1c111]' \
>>>>>>>>>>>>>>>>>>>>>   --model_output ~/tesstutorial/e13boutput/base 
>>>>>>>>>>>>>>>>>>>>> --learning_rate 20e-4 \
>>>>>>>>>>>>>>>>>>>>>   --train_listfile 
>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng.training_files.txt \
>>>>>>>>>>>>>>>>>>>>>   --eval_listfile 
>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng.training_files.txt \
>>>>>>>>>>>>>>>>>>>>>   --max_iterations 5000 
>>>>>>>>>>>>>>>>>>>>> &>~/tesstutorial/e13boutput/basetrain.log
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Test with base_checkpoint:
>>>>>>>>>>>>>>>>>>>>> src/training/lstmeval --model 
>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13boutput/base_checkpoint \
>>>>>>>>>>>>>>>>>>>>>   --traineddata 
>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>>>>>>>>>>>>>>>>>>   --eval_listfile 
>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng.training_files.txt
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Combining output files:
>>>>>>>>>>>>>>>>>>>>> src/training/lstmtraining --stop_training \
>>>>>>>>>>>>>>>>>>>>>   --continue_from 
>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13boutput/base_checkpoint \
>>>>>>>>>>>>>>>>>>>>>   --traineddata 
>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng/eng.traineddata \
>>>>>>>>>>>>>>>>>>>>>   --model_output 
>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13boutput/eng.traineddata
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Test with eng.traineddata:
>>>>>>>>>>>>>>>>>>>>> tesseract e13b.png out --tessdata-dir 
>>>>>>>>>>>>>>>>>>>>> /home/koichi/tesstutorial/e13boutput
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The training from scratch ended as:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> At iteration 561/2500/2500, Mean rms=0.159%, delta=0%, 
>>>>>>>>>>>>>>>>>>>>> char train=0%, word train=0%, skip ratio=0%,  New best 
>>>>>>>>>>>>>>>>>>>>> char error = 0 wrote 
>>>>>>>>>>>>>>>>>>>>> best 
>>>>>>>>>>>>>>>>>>>>> model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint
>>>>>>>>>>>>>>>>>>>>>  wrote 
>>>>>>>>>>>>>>>>>>>>> checkpoint.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The test with base_checkpoint returns nothing as:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> At iteration 0, stage 0, Eval Char error rate=0, Word 
>>>>>>>>>>>>>>>>>>>>> error rate=0
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The test with eng.traineddata and e13b.png returns 
>>>>>>>>>>>>>>>>>>>>> out.txt.  Both files are attached.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Training seems to have worked fine.  I don't know how 
>>>>>>>>>>>>>>>>>>>>> to translate the test result from base_checkpoint.  The 
>>>>>>>>>>>>>>>>>>>>> generated 
>>>>>>>>>>>>>>>>>>>>> eng.traineddata obviously doesn't work well. I suspect 
>>>>>>>>>>>>>>>>>>>>> the choice of 
>>>>>>>>>>>>>>>>>>>>> --traineddata in combining output files is bad but I have 
>>>>>>>>>>>>>>>>>>>>> no clue.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>> ElMagoElGato
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> BTW, I referred to your tess4training in the process.  
>>>>>>>>>>>>>>>>>>>>> It helped a lot.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2019年5月29日水曜日 19時14分08秒 UTC+9 shree:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> see 
>>>>>>>>>>>>>>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago <
>>>>>>>>>>>>>>>>>>>>>> elmago...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I wish to make a trained data for E13B font.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I read the training tutorial and made a 
>>>>>>>>>>>>>>>>>>>>>>> base_checkpoint file according to the method in 
>>>>>>>>>>>>>>>>>>>>>>> Training From Scratch.  
>>>>>>>>>>>>>>>>>>>>>>> Now, how can I make a trained data from the 
>>>>>>>>>>>>>>>>>>>>>>> base_checkpoint file?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed 
>>>>>>>>>>>>>>>>>>>>>>> to the Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving 
>>>>>>>>>>>>>>>>>>>>>>> emails from it, send an email to 
>>>>>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com.
>>>>>>>>>>>>>>>>>>>>>>> To post to this group, send email to 
>>>>>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com.
>>>>>>>>>>>>>>>>>>>>>>> Visit this group at 
>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr.
>>>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com
>>>>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>>>> For more options, visit 
>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/optout.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed 
>>>>>>>>>>>>>>>>>>>>> to the Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving 
>>>>>>>>>>>>>>>>>>>>> emails from it, send an email to 
>>>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com.
>>>>>>>>>>>>>>>>>>>>> To post to this group, send email to 
>>>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com.
>>>>>>>>>>>>>>>>>>>>> Visit this group at 
>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr.
>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com
>>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>> For more options, visit 
>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/optout.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed to 
>>>>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>>>>>> from it, send an email to tesser...@googlegroups.com.
>>>>>>>>>>>>>>>>>>> To post to this group, send email to 
>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com.
>>>>>>>>>>>>>>>>>>> Visit this group at 
>>>>>>>>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr.
>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com
>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>> For more options, visit 
>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/optout.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>>> from it, send an email to tesser...@googlegroups.com.
>>>>>>>>>>>>>>>> To post to this group, send email to 
>>>>>>>>>>>>>>>> tesser...@googlegroups.com.
>>>>>>>>>>>>>>>> Visit this group at 
>>>>>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr.
>>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5b151e61-5b41-4191-8d26-784809ef8e10%40googlegroups.com
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5b151e61-5b41-4191-8d26-784809ef8e10%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>>> it, send an email to tesser...@googlegroups.com.
>>>>>>>>>>>>>> To post to this group, send email to 
>>>>>>>>>>>>>> tesser...@googlegroups.com.
>>>>>>>>>>>>>> Visit this group at 
>>>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr.
>>>>>>>>>>>>>> To view this discussion on the web visit <a href="
>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/09d3119c-d093-4269-bf3a-3ddb467ed0ed%40googlegroups.com?utm_medium=email&utm_source=footer";
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> rel="nofollow" target="_blank" onmousedown="this.href=&#3
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/c02fd92c-21fe-48a0-a281-a2c01f5332ca%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/c02fd92c-21fe-48a0-a281-a2c01f5332ca%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/735437e2-16ca-42d5-8a8b-fdb7dfb7cb98%40googlegroups.com.

Re: [tesseract-ocr] Trained data for E13B font

Reply via email to