Hi,
I got this result from hocr. This is where one of the phantom characters comes from. <span class='ocrx_cinfo' title='x_bboxes 1259 902 1262 933; x_conf 98.864532'><</span> <span class='ocrx_cinfo' title='x_bboxes 1259 904 1281 933; x_conf 99.018097'>;</span> The firs character is the phantom. It starts with the second character that exists on x axis. The first character only has 3 points width. I attach ScrollView screen shots that visualize this. [image: 2019-07-24-132643_854x707_scrot.png][image: 2019-07-24-132800_854x707_scrot.png] There seem to be some more cases to cause phantom characters. I'll look them in. But I have a trivial question now. I made ScrollView show these displays by accidentally clicking Display->Blamer menu. There is Bounding Boxes menu below but it ends up showing a blue screen though it briefly shows boxes on the way. Can I use this menu at all? It'll be very useful. [image: 2019-07-24-140739_854x707_scrot.png] 2019年7月23日火曜日 17時10分36秒 UTC+9 ElGato ElMago: > > It's great! Perfect! Thanks a lot! > > 2019年7月23日火曜日 10時56分58秒 UTC+9 shree: >> >> See https://github.com/tesseract-ocr/tesseract/issues/2580 >> >> On Tue, 23 Jul 2019, 06:23 ElGato ElMago, <elmago...@gmail.com> wrote: >> >>> Hi, >>> >>> I read the output of hocr with lstm_choice_mode = 4 as to the pull >>> request 2554. It shows the candidates for each character but doesn't show >>> bounding box of each character. I only shows the box for a whole word. >>> >>> I see bounding boxes of each character in comments of the pull request >>> 2576. How can I do that? Do I have to look in the source code and >>> manipulate such an output on my own? >>> >>> 2019年7月19日金曜日 18時40分49秒 UTC+9 ElGato ElMago: >>> >>>> Lorenzo, >>>> >>>> I haven't been checking psm too much. Will turn to those options after >>>> I see how it goes with bounding boxes. >>>> >>>> Shree, >>>> >>>> I see the merges in the git log and also see that new >>>> option lstm_choice_amount works now. I guess my executable is latest >>>> though I still see the phantom character. Hocr makes huge and complex >>>> output. I'll take some to read it. >>>> >>>> 2019年7月19日金曜日 18時20分55秒 UTC+9 Claudiu: >>>>> >>>>> Is there any way to pass bounding boxes to use to the LSTM? We have an >>>>> algorithm that cleanly gets bounding boxes of MRZ characters. However the >>>>> results using psm 10 are worse than passing the whole line in. Yet when >>>>> we >>>>> pass the whole line in we get these phantom characters. >>>>> >>>>> Should PSM 10 mode work? It often returns “no character” where there >>>>> clearly is one. I can supply a test case if it is expected to work well. >>>>> >>>>> On Fri, Jul 19, 2019 at 11:06 AM ElGato ElMago <elmago...@gmail.com> >>>>> wrote: >>>>> >>>>>> Lorenzo, >>>>>> >>>>>> We both have got the same case. It seems a solution to this problem >>>>>> would save a lot of people. >>>>>> >>>>>> Shree, >>>>>> >>>>>> I pulled the current head of master branch but it doesn't seem to >>>>>> contain the merges you pointed that have been merged 3 to 4 days ago. >>>>>> How >>>>>> can I get them? >>>>>> >>>>>> ElMagoElGato >>>>>> >>>>>> 2019年7月19日金曜日 17時02分53秒 UTC+9 Lorenzo Blz: >>>>>>> >>>>>>> >>>>>>> >>>>>>> PSM 7 was a partial solution for my specific case, it improved the >>>>>>> situation but did not solve it. Also I could not use it in some other >>>>>>> cases. >>>>>>> >>>>>>> The proper solution is very likely doing more training with more >>>>>>> data, some data augmentation might probably help if data is scarce. >>>>>>> Also doing less training might help is the training is not done >>>>>>> correctly. >>>>>>> >>>>>>> There are also similar issues on github: >>>>>>> >>>>>>> https://github.com/tesseract-ocr/tesseract/issues/1465 >>>>>>> ... >>>>>>> >>>>>>> The LSTM engine works like this: it scans the image and for each >>>>>>> "pixel column" does this: >>>>>>> >>>>>>> M M M M N M M M [BLANK] F F F F >>>>>>> >>>>>>> (here i report only the highest probability characters) >>>>>>> >>>>>>> In the example above an M is partially seen as an N, this is normal, >>>>>>> and another step of the algorithm (beam search I think) tries to >>>>>>> aggregate >>>>>>> back the correct characters. >>>>>>> >>>>>>> I think cases like this: >>>>>>> >>>>>>> M M M N N N M M >>>>>>> >>>>>>> are what gives the phantom characters. More training should reduce >>>>>>> the source of the problem or a painful analysis of the bounding boxes >>>>>>> might >>>>>>> fix some cases. >>>>>>> >>>>>>> >>>>>>> I used the attached script for the boxes. >>>>>>> >>>>>>> >>>>>>> Lorenzo >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Il giorno ven 19 lug 2019 alle ore 07:25 ElGato ElMago < >>>>>>> elmago...@gmail.com> ha scritto: >>>>>>> >>>>>> Hi, >>>>>>>> >>>>>>>> Let's call them phantom characters then. >>>>>>>> >>>>>>>> Was psm 7 the solution for the issue 1778? None of the psm option >>>>>>>> didn't solve my problem though I see different output. >>>>>>>> >>>>>>>> I use tesseract 5.0-alpha mostly but 4.1 showed the same results >>>>>>>> anyway. How did you get bounding box for each character? Alto and >>>>>>>> lstmbox >>>>>>>> only show bbox for a group of characters. >>>>>>>> >>>>>>>> ElMagoElGato >>>>>>>> >>>>>>>> 2019年7月17日水曜日 18時58分31秒 UTC+9 Lorenzo Blz: >>>>>>>> >>>>>>>>> Phantom characters here for me too: >>>>>>>>> >>>>>>>>> https://github.com/tesseract-ocr/tesseract/issues/1778 >>>>>>>>> >>>>>>>>> Are you using 4.1? Bounding boxes were fixed in 4.1 maybe this was >>>>>>>>> also improved. >>>>>>>>> >>>>>>>>> I wrote some code that uses symbols iterator to discard symbols >>>>>>>>> that are clearly duplicated: too small, overlapping, etc. But it was >>>>>>>>> not >>>>>>>>> easy to make it work decently and it is not 100% reliable with false >>>>>>>>> negatives and positives. I cannot share the code and it is quite ugly >>>>>>>>> anyway. >>>>>>>>> >>>>>>>>> Here there is another MRZ model with training data: >>>>>>>>> >>>>>>>>> https://github.com/DoubangoTelecom/tesseractMRZ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Lorenzo >>>>>>>>> >>>>>>>>> >>>>>>>>> Il giorno mer 17 lug 2019 alle ore 11:26 Claudiu < >>>>>>>>> csaf...@gmail.com> ha scritto: >>>>>>>>> >>>>>>>>>> I’m getting the “phantom character” issue as well using the OCRB >>>>>>>>>> that Shree trained on MRZ lines. For example for a 0 it will >>>>>>>>>> sometimes add >>>>>>>>>> both a 0 and an O to the output , thus outputting 45 characters >>>>>>>>>> total >>>>>>>>>> instead of 44. I haven’t looked at the bounding box output yet but I >>>>>>>>>> suspect a phantom thin character is added somewhere that I can >>>>>>>>>> discard .. >>>>>>>>>> or maybe two chars will have the same bounding box. If anyone else >>>>>>>>>> has >>>>>>>>>> fixed this issue further up (eg so the output doesn’t contain the >>>>>>>>>> phantom >>>>>>>>>> characters in the first place) id be interested. >>>>>>>>>> >>>>>>>>>> On Wed, Jul 17, 2019 at 10:01 AM ElGato ElMago < >>>>>>>>>> elmago...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I'll go back to more of training later. Before doing so, I'd >>>>>>>>>>> like to investigate results a little bit. The hocr and lstmbox >>>>>>>>>>> options >>>>>>>>>>> give some details of positions of characters. The results show >>>>>>>>>>> positions >>>>>>>>>>> that perfectly correspond to letters in the image. But the text >>>>>>>>>>> output >>>>>>>>>>> contains a character that obviously does not exist. >>>>>>>>>>> >>>>>>>>>>> Then I found a config file 'lstmdebug' that generates far more >>>>>>>>>>> information. I hope it explains what happened with each character. >>>>>>>>>>> I'm >>>>>>>>>>> yet to read the debug output but I'd appreciate it if someone could >>>>>>>>>>> tell me >>>>>>>>>>> how to read it because it's really complex. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> ElMagoElGato >>>>>>>>>>> >>>>>>>>>>> 2019年6月14日金曜日 19時58分49秒 UTC+9 shree: >>>>>>>>>>> >>>>>>>>>>>> See https://github.com/Shreeshrii/tessdata_MICR >>>>>>>>>>>> >>>>>>>>>>>> I have uploaded my files there. >>>>>>>>>>>> >>>>>>>>>>>> https://github.com/Shreeshrii/tessdata_MICR/blob/master/MICR.sh >>>>>>>>>>>> is the bash script that runs the training. >>>>>>>>>>>> >>>>>>>>>>>> You can modify as needed. Please note this is for legacy/base >>>>>>>>>>>> tesseract --oem 0. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jun 14, 2019 at 1:26 PM ElGato ElMago < >>>>>>>>>>>> elmago...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks a lot, shree. It seems you know everything. >>>>>>>>>>>>> >>>>>>>>>>>>> I tried the MICR0.traineddata and the first two >>>>>>>>>>>>> mcr.traineddata. The last one was blocked by the browser. Each >>>>>>>>>>>>> of the >>>>>>>>>>>>> traineddata had mixed results. All of them are getting symbols >>>>>>>>>>>>> fairly good >>>>>>>>>>>>> but getting spaces randomly and reading some numbers wrong. >>>>>>>>>>>>> >>>>>>>>>>>>> MICR0 seems the best among them. Did you suggest that you'd >>>>>>>>>>>>> be able to update it? It gets tripple D very often where there's >>>>>>>>>>>>> only one, >>>>>>>>>>>>> and so on. >>>>>>>>>>>>> >>>>>>>>>>>>> Also, I tried to fine tune from MICR0 but I found that I need >>>>>>>>>>>>> to change the language-specific.sh. It specifies some parameters >>>>>>>>>>>>> for each >>>>>>>>>>>>> language. Do you have any guidance for it? >>>>>>>>>>>>> >>>>>>>>>>>>> 2019年6月14日金曜日 1時48分40秒 UTC+9 shree: >>>>>>>>>>>>>> >>>>>>>>>>>>>> see >>>>>>>>>>>>>> http://www.devscope.net/Content/ocrchecks.aspx >>>>>>>>>>>>>> https://github.com/BigPino67/Tesseract-MICR-OCR >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://groups.google.com/d/msg/tesseract-ocr/obWI4cz8rXg/6l82hEySgOgJ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Jun 10, 2019 at 11:21 AM ElGato ElMago < >>>>>>>>>>>>>> elmago...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> That'll be nice if there's traineddata out there but I >>>>>>>>>>>>>>> didn't find any. I see free fonts and commercial OCR software >>>>>>>>>>>>>>> but not >>>>>>>>>>>>>>> traineddata. Tessdata repository obviously doesn't have one, >>>>>>>>>>>>>>> either. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2019年6月8日土曜日 1時52分10秒 UTC+9 shree: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please also search for existing MICR traineddata files. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jun 6, 2019 at 1:09 PM ElGato ElMago < >>>>>>>>>>>>>>>> elmago...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So I did several tests from scratch. In the last attempt, >>>>>>>>>>>>>>>>> I made a training text with 4,000 lines in the following >>>>>>>>>>>>>>>>> format, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 110004310510< <02 :4002=0181:801= 0008752 <00039 >>>>>>>>>>>>>>>>> ;0000001000; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> and combined it with eng.digits.training_text in which >>>>>>>>>>>>>>>>> symbols are converted to E13B symbols. This makes about >>>>>>>>>>>>>>>>> 12,000 lines of >>>>>>>>>>>>>>>>> training text. It's amazing that this thing generates a good >>>>>>>>>>>>>>>>> reader out of >>>>>>>>>>>>>>>>> nowhere. But then it is not very good. For example: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <01 :1901=1386:021= 1111001<10001< ;0000090134; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> is a result on the image attached. It's close but the >>>>>>>>>>>>>>>>> last '<' in the result text doesn't exist on the image. It's >>>>>>>>>>>>>>>>> a small >>>>>>>>>>>>>>>>> failure but it causes a greater trouble in parsing. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What would you suggest from here to increase accuracy? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - Increase the number of lines in the training text >>>>>>>>>>>>>>>>> - Mix up more variations in the training text >>>>>>>>>>>>>>>>> - Increase the number of iterations >>>>>>>>>>>>>>>>> - Investigate wrong reads one by one >>>>>>>>>>>>>>>>> - Or else? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Also, I referred to engrestrict*.* and could generate >>>>>>>>>>>>>>>>> similar result with the fine-tuning-from-full method. It >>>>>>>>>>>>>>>>> seems a bit >>>>>>>>>>>>>>>>> faster to get to the same level but it also stops at a 'good' >>>>>>>>>>>>>>>>> level. I can >>>>>>>>>>>>>>>>> go with either way if it takes me to the bright future. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> ElMagoElGato >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2019年5月30日木曜日 15時56分02秒 UTC+9 ElGato ElMago: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks a lot, Shree. I'll look it in. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2019年5月30日木曜日 14時39分52秒 UTC+9 shree: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> See https://github.com/Shreeshrii/tessdata_shreetest >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Look at the files engrestrict*.* and also >>>>>>>>>>>>>>>>>>> https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Create training text of about 100 lines and finetune for >>>>>>>>>>>>>>>>>>> 400 lines >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, May 30, 2019 at 9:38 AM ElGato ElMago < >>>>>>>>>>>>>>>>>>> elmago...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I had about 14 lines as attached. How many lines would >>>>>>>>>>>>>>>>>>>> you recommend? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Fine tuning gives much better result but it tends to >>>>>>>>>>>>>>>>>>>> pick other character than in E13B that only has 14 >>>>>>>>>>>>>>>>>>>> characters, 0 through 9 >>>>>>>>>>>>>>>>>>>> and 4 symbols. I thought training from scratch would >>>>>>>>>>>>>>>>>>>> eliminate such >>>>>>>>>>>>>>>>>>>> confusion. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2019年5月30日木曜日 10時43分08秒 UTC+9 shree: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> For training from scratch a large training text and >>>>>>>>>>>>>>>>>>>>> hundreds of thousands of iterations are recommended. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> If you are just fine tuning for a font try to follow >>>>>>>>>>>>>>>>>>>>> instructions for training for impact, with your font. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, 30 May 2019, 06:05 ElGato ElMago, < >>>>>>>>>>>>>>>>>>>>> elmago...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, Shree. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yes, I saw the instruction. The steps I made are as >>>>>>>>>>>>>>>>>>>>>> follows: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Using tesstrain.sh: >>>>>>>>>>>>>>>>>>>>>> src/training/tesstrain.sh --fonts_dir >>>>>>>>>>>>>>>>>>>>>> /usr/share/fonts --lang eng --linedata_only \ >>>>>>>>>>>>>>>>>>>>>> --noextract_font_properties --langdata_dir >>>>>>>>>>>>>>>>>>>>>> ../langdata \ >>>>>>>>>>>>>>>>>>>>>> --tessdata_dir ./tessdata \ >>>>>>>>>>>>>>>>>>>>>> --fontlist "E13Bnsd" --output_dir >>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval \ >>>>>>>>>>>>>>>>>>>>>> --training_text >>>>>>>>>>>>>>>>>>>>>> ../langdata/eng/eng.training_e13b_text >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Training from scratch: >>>>>>>>>>>>>>>>>>>>>> mkdir -p ~/tesstutorial/e13boutput >>>>>>>>>>>>>>>>>>>>>> src/training/lstmtraining --debug_interval 100 \ >>>>>>>>>>>>>>>>>>>>>> --traineddata >>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng/eng.traineddata \ >>>>>>>>>>>>>>>>>>>>>> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 >>>>>>>>>>>>>>>>>>>>>> Lrx96 Lfx256 O1c111]' \ >>>>>>>>>>>>>>>>>>>>>> --model_output ~/tesstutorial/e13boutput/base >>>>>>>>>>>>>>>>>>>>>> --learning_rate 20e-4 \ >>>>>>>>>>>>>>>>>>>>>> --train_listfile >>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng.training_files.txt \ >>>>>>>>>>>>>>>>>>>>>> --eval_listfile >>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng.training_files.txt \ >>>>>>>>>>>>>>>>>>>>>> --max_iterations 5000 >>>>>>>>>>>>>>>>>>>>>> &>~/tesstutorial/e13boutput/basetrain.log >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Test with base_checkpoint: >>>>>>>>>>>>>>>>>>>>>> src/training/lstmeval --model >>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13boutput/base_checkpoint \ >>>>>>>>>>>>>>>>>>>>>> --traineddata >>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng/eng.traineddata \ >>>>>>>>>>>>>>>>>>>>>> --eval_listfile >>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng.training_files.txt >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Combining output files: >>>>>>>>>>>>>>>>>>>>>> src/training/lstmtraining --stop_training \ >>>>>>>>>>>>>>>>>>>>>> --continue_from >>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13boutput/base_checkpoint \ >>>>>>>>>>>>>>>>>>>>>> --traineddata >>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13beval/eng/eng.traineddata \ >>>>>>>>>>>>>>>>>>>>>> --model_output >>>>>>>>>>>>>>>>>>>>>> ~/tesstutorial/e13boutput/eng.traineddata >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Test with eng.traineddata: >>>>>>>>>>>>>>>>>>>>>> tesseract e13b.png out --tessdata-dir >>>>>>>>>>>>>>>>>>>>>> /home/koichi/tesstutorial/e13boutput >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The training from scratch ended as: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> At iteration 561/2500/2500, Mean rms=0.159%, >>>>>>>>>>>>>>>>>>>>>> delta=0%, char train=0%, word train=0%, skip ratio=0%, >>>>>>>>>>>>>>>>>>>>>> New best char error >>>>>>>>>>>>>>>>>>>>>> = 0 wrote best >>>>>>>>>>>>>>>>>>>>>> model:/home/koichi/tesstutorial/e13boutput/base0_561.checkpoint >>>>>>>>>>>>>>>>>>>>>> wrote >>>>>>>>>>>>>>>>>>>>>> checkpoint. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The test with base_checkpoint returns nothing as: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> At iteration 0, stage 0, Eval Char error rate=0, Word >>>>>>>>>>>>>>>>>>>>>> error rate=0 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The test with eng.traineddata and e13b.png returns >>>>>>>>>>>>>>>>>>>>>> out.txt. Both files are attached. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Training seems to have worked fine. I don't know how >>>>>>>>>>>>>>>>>>>>>> to translate the test result from base_checkpoint. The >>>>>>>>>>>>>>>>>>>>>> generated >>>>>>>>>>>>>>>>>>>>>> eng.traineddata obviously doesn't work well. I suspect >>>>>>>>>>>>>>>>>>>>>> the choice of >>>>>>>>>>>>>>>>>>>>>> --traineddata in combining output files is bad but I >>>>>>>>>>>>>>>>>>>>>> have no clue. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>> ElMagoElGato >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> BTW, I referred to your tess4training in the >>>>>>>>>>>>>>>>>>>>>> process. It helped a lot. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 2019年5月29日水曜日 19時14分08秒 UTC+9 shree: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> see >>>>>>>>>>>>>>>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Wed, May 29, 2019 at 3:18 PM ElGato ElMago < >>>>>>>>>>>>>>>>>>>>>>> elmago...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I wish to make a trained data for E13B font. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I read the training tutorial and made a >>>>>>>>>>>>>>>>>>>>>>>> base_checkpoint file according to the method in >>>>>>>>>>>>>>>>>>>>>>>> Training From Scratch. >>>>>>>>>>>>>>>>>>>>>>>> Now, how can I make a trained data from the >>>>>>>>>>>>>>>>>>>>>>>> base_checkpoint file? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>> You received this message because you are >>>>>>>>>>>>>>>>>>>>>>>> subscribed to the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving >>>>>>>>>>>>>>>>>>>>>>>> emails from it, send an email to >>>>>>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com. >>>>>>>>>>>>>>>>>>>>>>>> To post to this group, send email to >>>>>>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com. >>>>>>>>>>>>>>>>>>>>>>>> Visit this group at >>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr. >>>>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/4848cfa5-ae2b-4be3-a771-686aa0fec702%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>>>> For more options, visit >>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/optout. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> ____________________________________________________________ >>>>>>>>>>>>>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed >>>>>>>>>>>>>>>>>>>>>> to the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving >>>>>>>>>>>>>>>>>>>>>> emails from it, send an email to >>>>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com. >>>>>>>>>>>>>>>>>>>>>> To post to this group, send email to >>>>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com. >>>>>>>>>>>>>>>>>>>>>> Visit this group at >>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr. >>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/7f29f47e-c6f5-4743-832d-94e7d28ab4e8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>> For more options, visit >>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/optout. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> You received this message because you are subscribed to >>>>>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving >>>>>>>>>>>>>>>>>>>> emails from it, send an email to >>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com. >>>>>>>>>>>>>>>>>>>> To post to this group, send email to >>>>>>>>>>>>>>>>>>>> tesser...@googlegroups.com. >>>>>>>>>>>>>>>>>>>> Visit this group at >>>>>>>>>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr. >>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2c6fe865-911d-41f3-9926-cbfb56db794f%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>> For more options, visit >>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/optout. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ____________________________________________________________ >>>>>>>>>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> You received this message because you are subscribed to >>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>>>> from it, send an email to tesser...@googlegroups.com. >>>>>>>>>>>>>>>>> To post to this g >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7f1fd2ea-3cd9-4d75-a037-2b2390c4271d%40googlegroups.com.