Re: [tesseract-ocr] accuracy problem after trained in fine-tune

Des Bw Sun, 22 Oct 2023 02:49:52 -0700

here it 
is: 
https://github.com/tesseract-ocr/tessdoc/blob/main/Data-Files-in-tessdata_best.md


On Sunday, October 22, 2023 at 12:45:40 PM UTC+3 Des Bw wrote:

> This is the code I used to train from a layer: 
> *make training MODEL_NAME=amh START_MODEL=amh APPEND_INDEX=5 
> NET_SPEC='[Lfx256 O1c105]' TESSDATA=../tesseract/tessdata EPOCHS=3 
> TARGET_ERROR_RATE=0.0001 training >> data/amh.log &*
> *- I took it from Scheer' training *tesstrain-JSTORArabic*: *
> https://github.com/Shreeshrii/tesstrain-JSTORArabic
>
> - The net_spec of ben might not be the same to amh. Shreeshrii has sent a 
> link on the netspecs of languages, in this forum.  
>
> On Sunday, October 22, 2023 at 12:09:25 PM UTC+3 mdalihu...@gmail.com 
> wrote:
>
>> you can test by changes '--char spacing=1.0 . i think it would be problem 
>> accuracy of result on it also.
>> On Sunday, 22 October, 2023 at 3:07:16 pm UTC+6 Ali hussain wrote:
>>
>>> i haven't tried by cut the top layer of the network. you can share your 
>>> knowledge what you done by cut the top layer of the network. or github 
>>> project link.
>>> On Sunday, 22 October, 2023 at 12:27:32 pm UTC+6 desal...@gmail.com 
>>> wrote:
>>>
>>>> That is massive data. Have you tried to train by cut the top layer of 
>>>> the network?
>>>> I think that is the most promising approach. I was getting really good 
>>>> results with that. But, the result is not getting translated to scanned 
>>>> documents. I get best results with the syntethic data. I am no 
>>>> experimenting with the settings in text2image if it is possible to emulate 
>>>> the scanned documents. 
>>>> I am also suspecting this setting   '--char_spacing=1.0', in our setup 
>>>> is causing more trouble. Scanned documents come with characters spacing 
>>>> close to zero.If you are planning to train more, try removing this 
>>>> parameter. 
>>>>
>>>> On Sunday, October 22, 2023 at 4:09:46 AM UTC+3 mdalihu...@gmail.com 
>>>> wrote:
>>>>
>>>>> 600000 lines of text and the itarations  higher then 600000. but some 
>>>>> time i got better result in lower itarations in finetune like 100000 
>>>>> lines 
>>>>> of text and itaration is only 5000 to10000. 
>>>>> On Saturday, 21 October, 2023 at 11:37:13 am UTC+6 desal...@gmail.com 
>>>>> wrote:
>>>>>
>>>>>> How many lines of text and iterations did you use?
>>>>>>
>>>>>> On Saturday, October 21, 2023 at 8:36:38 AM UTC+3 Des Bw wrote:
>>>>>>
>>>>>>> Yah, that is what I am getting as well. I was able to add the 
>>>>>>> missing letter. But, the overall accuracy become lower than the default 
>>>>>>> model. 
>>>>>>>
>>>>>>> On Saturday, October 21, 2023 at 3:22:44 AM UTC+3 
>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>
>>>>>>>> not good result. that's way i stop to training now. default 
>>>>>>>> traineddata is overall good then scratch.
>>>>>>>> On Thursday, 19 October, 2023 at 11:32:08 pm UTC+6 
>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>
>>>>>>>>> Hi Ali, 
>>>>>>>>> How is your training going?
>>>>>>>>> Do you get good results with the training-from-the-scratch?
>>>>>>>>>
>>>>>>>>> On Friday, September 15, 2023 at 6:42:26 PM UTC+3 tesseract-ocr 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> yes, two months ago when I started to learn OCR I saw that. it 
>>>>>>>>>> was very helpful at the beginning.
>>>>>>>>>> On Friday, 15 September, 2023 at 4:01:32 pm UTC+6 
>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>
>>>>>>>>>>> Just saw this paper: https://osf.io/b8h7q
>>>>>>>>>>>
>>>>>>>>>>> On Thursday, September 14, 2023 at 9:02:22 PM UTC+3 
>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I will try some changes. thx
>>>>>>>>>>>>
>>>>>>>>>>>> On Thursday, 14 September, 2023 at 2:46:36 pm UTC+6 
>>>>>>>>>>>> elvi...@gmail.com wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I also faced that issue in the Windows. Apparently, the issue 
>>>>>>>>>>>>> is related with unicode. You can try your luck by changing  "r" 
>>>>>>>>>>>>> to "utf8" 
>>>>>>>>>>>>> in the script.
>>>>>>>>>>>>> I end up installing Ubuntu because i was having too many 
>>>>>>>>>>>>> errors in the Windows.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Sep 14, 2023, 9:33 AM Ali hussain <
>>>>>>>>>>>>> mdalihu...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> you faced this error,  Can't encode transcription? if you 
>>>>>>>>>>>>>> faced how you have solved this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thursday, 14 September, 2023 at 10:51:52 am UTC+6 
>>>>>>>>>>>>>> elvi...@gmail.com wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was using my own text
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Sep 14, 2023, 6:58 AM Ali hussain <
>>>>>>>>>>>>>>> mdalihu...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> you are training from Tessearact default text data or your 
>>>>>>>>>>>>>>>> own collected text data?
>>>>>>>>>>>>>>>> On Thursday, 14 September, 2023 at 12:19:53 am UTC+6 
>>>>>>>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I now get to 200000 iterations; and the error rate is 
>>>>>>>>>>>>>>>>> stuck at 0.46. The result is absolutely trash: nowhere close 
>>>>>>>>>>>>>>>>> to the 
>>>>>>>>>>>>>>>>> default/Ray's training. 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 2:47:05 PM UTC+3 
>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> after Tesseact recognizes text from images. then you can 
>>>>>>>>>>>>>>>>>> apply regex to replace the wrong word with to correct word.
>>>>>>>>>>>>>>>>>> I'm not familiar with paddleOcr and scanTailor also.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 5:06:12 pm UTC+6 
>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> At what stage are you doing the regex replacement?
>>>>>>>>>>>>>>>>>>> My process has been: Scan (tif)--> ScanTailor --> 
>>>>>>>>>>>>>>>>>>> Tesseract --> pdf
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> >EasyOCR I think is best for ID cards or something like 
>>>>>>>>>>>>>>>>>>> that image process. but document images like books, here 
>>>>>>>>>>>>>>>>>>> Tesseract is 
>>>>>>>>>>>>>>>>>>> better than EasyOCR.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> How about paddleOcr?, are you familiar with it?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 1:45:54 PM UTC+3 
>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I know what you mean. but in some cases, it helps me.  
>>>>>>>>>>>>>>>>>>>> I have faced specific characters and words are always not 
>>>>>>>>>>>>>>>>>>>> recognized by 
>>>>>>>>>>>>>>>>>>>> Tesseract. That way I use these regex to replace those 
>>>>>>>>>>>>>>>>>>>> characters   and 
>>>>>>>>>>>>>>>>>>>> words if  those characters are incorrect.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> see what I have done: 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>    " ী": "ী",
>>>>>>>>>>>>>>>>>>>>     " ্": " ",
>>>>>>>>>>>>>>>>>>>>     " ে": " ",
>>>>>>>>>>>>>>>>>>>>     জ্া: "জা",
>>>>>>>>>>>>>>>>>>>>     "  ": " ",
>>>>>>>>>>>>>>>>>>>>     "   ": " ",
>>>>>>>>>>>>>>>>>>>>     "    ": " ",
>>>>>>>>>>>>>>>>>>>>     "্প": " ",
>>>>>>>>>>>>>>>>>>>>     " য": "র্য",
>>>>>>>>>>>>>>>>>>>>     য: "য",
>>>>>>>>>>>>>>>>>>>>     " া": "া",
>>>>>>>>>>>>>>>>>>>>     আা: "আ",
>>>>>>>>>>>>>>>>>>>>     ম্ি: "মি",
>>>>>>>>>>>>>>>>>>>>     স্ু: "সু",
>>>>>>>>>>>>>>>>>>>>     "হূ ": "হূ",
>>>>>>>>>>>>>>>>>>>>     " ণ": "ণ",
>>>>>>>>>>>>>>>>>>>>     র্্: "র",
>>>>>>>>>>>>>>>>>>>>     "চিন্ত ": "চিন্তা ",
>>>>>>>>>>>>>>>>>>>>     ন্া: "না",
>>>>>>>>>>>>>>>>>>>>     "সম ূর্ন": "সম্পূর্ণ",
>>>>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 4:18:22 pm UTC+6 
>>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The problem for regex is that Tesseract is not 
>>>>>>>>>>>>>>>>>>>>> consistent in its replacement. 
>>>>>>>>>>>>>>>>>>>>> Think of the original training of English data doesn't 
>>>>>>>>>>>>>>>>>>>>> contain the letter /u/. What does Tesseract do when it 
>>>>>>>>>>>>>>>>>>>>> faces /u/ in actual 
>>>>>>>>>>>>>>>>>>>>> processing??
>>>>>>>>>>>>>>>>>>>>> In some cases, it replaces it with closely similar 
>>>>>>>>>>>>>>>>>>>>> letters such as /v/ and /w/. In other cases, it 
>>>>>>>>>>>>>>>>>>>>> completely removes it. That 
>>>>>>>>>>>>>>>>>>>>> is what is happening with my case. Those characters re 
>>>>>>>>>>>>>>>>>>>>> sometimes completely 
>>>>>>>>>>>>>>>>>>>>> removed; other times, they are replaced by closely 
>>>>>>>>>>>>>>>>>>>>> resembling characters. 
>>>>>>>>>>>>>>>>>>>>> Because of this inconsistency, applying regex is very 
>>>>>>>>>>>>>>>>>>>>> difficult. 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 1:02:01 PM UTC+3 
>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> if Some specific characters or words are always 
>>>>>>>>>>>>>>>>>>>>>> missing from the OCR result.  then you can apply logic 
>>>>>>>>>>>>>>>>>>>>>> with the Regular 
>>>>>>>>>>>>>>>>>>>>>> expressions method on your applications. After OCR, 
>>>>>>>>>>>>>>>>>>>>>> these specific 
>>>>>>>>>>>>>>>>>>>>>> characters or words will be replaced by current 
>>>>>>>>>>>>>>>>>>>>>> characters or words that 
>>>>>>>>>>>>>>>>>>>>>> you defined in your applications by  Regular 
>>>>>>>>>>>>>>>>>>>>>> expressions. it can be done in 
>>>>>>>>>>>>>>>>>>>>>> some major problems.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 3:51:29 pm UTC+6 
>>>>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The characters are getting missed, even after 
>>>>>>>>>>>>>>>>>>>>>>> fine-tuning. 
>>>>>>>>>>>>>>>>>>>>>>> I never made any progress. I tried many different 
>>>>>>>>>>>>>>>>>>>>>>> ways. Some  specific characters are always missing from 
>>>>>>>>>>>>>>>>>>>>>>> the OCR result.  
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 12:49:20 PM 
>>>>>>>>>>>>>>>>>>>>>>> UTC+3 mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> EasyOCR I think is best for ID cards or something 
>>>>>>>>>>>>>>>>>>>>>>>> like that image process. but document images like 
>>>>>>>>>>>>>>>>>>>>>>>> books, here Tesseract is 
>>>>>>>>>>>>>>>>>>>>>>>> better than EasyOCR.  Even I didn't use EasyOCR. you 
>>>>>>>>>>>>>>>>>>>>>>>> can try it.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I have added words of dictionaries but the result 
>>>>>>>>>>>>>>>>>>>>>>>> is the same. 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> what kind of problem you have faced in fine-tuning 
>>>>>>>>>>>>>>>>>>>>>>>> in few new characters as you said (*but, I failed 
>>>>>>>>>>>>>>>>>>>>>>>> in every possible way to introduce a few new 
>>>>>>>>>>>>>>>>>>>>>>>> characters into the database.)*
>>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 3:33:48 pm 
>>>>>>>>>>>>>>>>>>>>>>>> UTC+6 desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Yes, we are new to this. I find the instructions 
>>>>>>>>>>>>>>>>>>>>>>>>> (the manual) very hard to follow. The video you 
>>>>>>>>>>>>>>>>>>>>>>>>> linked above was really 
>>>>>>>>>>>>>>>>>>>>>>>>> helpful  to get started. My plan at the beginning was 
>>>>>>>>>>>>>>>>>>>>>>>>> to fine tune the 
>>>>>>>>>>>>>>>>>>>>>>>>> existing .traineddata. But, I failed in every 
>>>>>>>>>>>>>>>>>>>>>>>>> possible way to introduce a 
>>>>>>>>>>>>>>>>>>>>>>>>> few new characters into the database. That is why I 
>>>>>>>>>>>>>>>>>>>>>>>>> started from scratch. 
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Sure, I will follow Lorenzo's suggestion: will run 
>>>>>>>>>>>>>>>>>>>>>>>>> more the iterations, and see if I can improve. 
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Another areas we need to explore is usage of 
>>>>>>>>>>>>>>>>>>>>>>>>> dictionaries actually. May be adding millions of 
>>>>>>>>>>>>>>>>>>>>>>>>> words into the 
>>>>>>>>>>>>>>>>>>>>>>>>> dictionary could help Tesseract. I don't have 
>>>>>>>>>>>>>>>>>>>>>>>>> millions of words; but I am 
>>>>>>>>>>>>>>>>>>>>>>>>> looking into some corpus to get more words into the 
>>>>>>>>>>>>>>>>>>>>>>>>> dictionary. 
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> If this all fails, EasyOCR (and probably other 
>>>>>>>>>>>>>>>>>>>>>>>>> similar open-source packages)  is probably our next 
>>>>>>>>>>>>>>>>>>>>>>>>> option to try on. Sure, 
>>>>>>>>>>>>>>>>>>>>>>>>> sharing our experiences will be helpful. I will let 
>>>>>>>>>>>>>>>>>>>>>>>>> you know if I made good 
>>>>>>>>>>>>>>>>>>>>>>>>> progresses in any of these options. 
>>>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 12:19:48 PM 
>>>>>>>>>>>>>>>>>>>>>>>>> UTC+3 mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> How is your training going for Bengali?  It was 
>>>>>>>>>>>>>>>>>>>>>>>>>> nearly good but I faced space problems between two 
>>>>>>>>>>>>>>>>>>>>>>>>>> words, some words are 
>>>>>>>>>>>>>>>>>>>>>>>>>> spaces but most of them have no space. I think is 
>>>>>>>>>>>>>>>>>>>>>>>>>> problem is in the dataset 
>>>>>>>>>>>>>>>>>>>>>>>>>> but I use the default training dataset from 
>>>>>>>>>>>>>>>>>>>>>>>>>> Tesseract which is used in Ben 
>>>>>>>>>>>>>>>>>>>>>>>>>> That way I am confused so I have to explore more. by 
>>>>>>>>>>>>>>>>>>>>>>>>>> the way,  you can try 
>>>>>>>>>>>>>>>>>>>>>>>>>> as Lorenzo Blz said.  Actually training from 
>>>>>>>>>>>>>>>>>>>>>>>>>> scratch is harder than fine-tuning. so you can use 
>>>>>>>>>>>>>>>>>>>>>>>>>> different datasets to 
>>>>>>>>>>>>>>>>>>>>>>>>>> explore. if you succeed. please let me know how you 
>>>>>>>>>>>>>>>>>>>>>>>>>> have done this whole 
>>>>>>>>>>>>>>>>>>>>>>>>>> process.  I'm also new in this field.
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 1:13:43 pm 
>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+6 desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> How is your training going for Bengali?
>>>>>>>>>>>>>>>>>>>>>>>>>>> I have been trying to train from scratch. I made 
>>>>>>>>>>>>>>>>>>>>>>>>>>> about 64,000 lines of text (which produced about 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 255,000 files, in the end) 
>>>>>>>>>>>>>>>>>>>>>>>>>>> and run the training for 150,000 iterations; 
>>>>>>>>>>>>>>>>>>>>>>>>>>> getting 0.51 training error 
>>>>>>>>>>>>>>>>>>>>>>>>>>> rate. I was hopping to get reasonable accuracy. 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Unfortunately, when I run 
>>>>>>>>>>>>>>>>>>>>>>>>>>> the OCR using  .traineddata,  the accuracy is 
>>>>>>>>>>>>>>>>>>>>>>>>>>> absolutely terrible. Do you 
>>>>>>>>>>>>>>>>>>>>>>>>>>> think I made some mistakes, or that is an expected 
>>>>>>>>>>>>>>>>>>>>>>>>>>> result?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tuesday, September 12, 2023 at 11:15:25 PM 
>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+3 mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, he doesn't mention all fonts but only one 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> font.  That way he didn't use *MODEL_NAME in a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> separate **script **file script I think.*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, here we teach all *tif, gt.txt, and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> .box files *which are created by  *MODEL_NAME 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I mean **eng, ben, oro flag or language code 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> *because 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> when we first create *tif, gt.txt, and .box 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> files, *every file starts by  *MODEL_NAME*. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> This  *MODEL_NAME*  we selected on the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> training script for looping each tif, gt.txt, and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> .box files which are 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> created by  *MODEL_NAME.*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tuesday, 12 September, 2023 at 9:42:13 pm 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+6 desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I am familiar with the video and have set 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> up the folder structure as you did. Indeed, I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have tried a number of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine-tuning with a single font following Gracia's 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> video. But, your script 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is much  better because supports multiple fonts. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The whole improvement you 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> made is  brilliant; and very useful. It is all 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working for me. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The only part that I didn't understand is the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trick you used in your tesseract_train.py script. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You see, I have been 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doing exactly to you did except this script. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The scripts seems to have the trick of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sending/teaching each of the fonts (iteratively) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into the model. The script 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have been using  (which I get from Garcia) 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doesn't mention font at all. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *TESSDATA_PREFIX=../tesseract/tessdata make 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training MODEL_NAME=oro 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TESSDATA=../tesseract/tessdata 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does it mean that my model does't train the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fonts (even if the fonts have been included in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the splitting process, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the other script)?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Monday, September 11, 2023 at 10:54:08 AM 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+3 mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *import subprocess# List of font 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> namesfont_names = ['ben']for font in font_names: 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    command = 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata make 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training MODEL_NAME={font} 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000"*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *    subprocess.run(command, shell=True) 1 . 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This command is for training data that I have 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> named '*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tesseract_training*.py' inside tesstrain 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folder.*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *2. root directory means your main training 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folder and inside it as like langdata, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tessearact,  tesstrain folders. if 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you see this tutorial    *
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=KE4xEzFGSU8  
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  you will understand better the folder 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure. only I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> created tesseract_training.py in tesstrain 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folder for training and  
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FontList.py file is the main path as *like 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> langdata, tessearact,  tesstrain, and *
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> split_training_text.py.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. first of all you have to put all fonts in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your Linux fonts folder.   /usr/share/fonts/  
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then run:  sudo apt update  then sudo 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fc-cache -fv
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after that, you have to add the exact font's 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> name in FontList.py file like me.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I  have added two pic my folder structure. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first is main structure pic and the second is 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the Colopse tesstrain folder.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I[image: Screenshot 2023-09-11 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 134947.png][image: 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Screenshot 2023-09-11 135014.png] 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Monday, 11 September, 2023 at 12:50:03 pm 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+6 desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you so much for putting out these 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> brilliant scripts. They make the process  much 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more efficient.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have one more question on the other script 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that you use to train. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *import subprocess# List of font 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> namesfont_names = ['ben']for font in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_names:    command = 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata make 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training MODEL_NAME={font} 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000"*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *    subprocess.run(command, shell=True) *
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do you have the name of fonts listed in file 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the same/root directory?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you setup the names of the fonts in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the file, if you don't mind sharing it?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Monday, September 11, 2023 at 4:27:27 AM 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+3 mdalihu...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You can use the new script below. it's 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> better than the previous two scripts.  You can 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create *tif, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gt.txt, and .box files *by multiple fonts 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and also use breakpoint if vs code close or 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anything during creating *tif, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gt.txt, and .box files *then you can 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> checkpoint to navigate where you close vs code.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command for *tif, gt.txt, and .box files *
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import os
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import random
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import pathlib
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import argparse
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from FontList import FontList
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def create_training_data(training_text_file, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, start_line=
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> None, end_line=None):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     lines = []
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     with open(training_text_file, 'r') as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> input_file:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         lines = input_file.readlines()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     if not os.path.exists(output_directory
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         os.mkdir(output_directory)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     if start_line is None:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         start_line = 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     if end_line is None:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         end_line = len(lines) - 1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     for font_name in font_list.fonts:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         for line_index in range(start_line, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line + 1):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             line = lines[line_index].strip
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             training_text_file_name = 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathlib.Path(training_text_file).stem
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             line_serial = f"{line_index:d}"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             line_gt_text = os.path.join(
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory, f'{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name}_{line_serial}_{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_name.replace(" ", "_")}.gt.txt')
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             with open(line_gt_text, 'w') as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_file:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 output_file.writelines
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ([line])
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             file_base_name = f'{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name}_{line_serial}_{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_name.replace(" ", "_")}'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             subprocess.run([
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 'text2image',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 f'--font={font_name}',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 f'--text={line_gt_text}',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 f'--outputbase={
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory}/{file_base_name}',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--max_pages=1',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--strip_unrenderable_words
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--leading=36',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--xsize=3600',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--ysize=330',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--char_spacing=1.0',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--exposure=0',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --unicharset_file=langdata/eng.unicharset',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             ])
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if __name__ == "__main__":
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     parser = argparse.ArgumentParser()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     parser.add_argument('--start', type=int, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help='Starting line count (inclusive)')
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     parser.add_argument('--end', type=int, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help='Ending line count (inclusive)')
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     args = parser.parse_args()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     training_text_file = '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> langdata/eng.training_text'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     output_directory = '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tesstrain/data/eng-ground-truth'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     font_list = FontList()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     create_training_data(training_text_file, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, args.start, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> args.end)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then create a file called "FontList" in the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> root directory and paste it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class FontList:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     def __init__(self):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         self.fonts = [
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         "Gerlick"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "Sagar Medium",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "Ekushey Lohit Normal",  
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>            "Charukola Round Head Regular, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> weight=433",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "Charukola Round Head Bold, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> weight=443",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             "Ador Orjoma Unicode",
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>       
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>           
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                        
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ]                         
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then import in the above code,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *for breakpoint command:*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sudo python3 split_training_text.py --start 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0  --end 11
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change checkpoint according to you  --start 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0 --end 11.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *and training checkpoint as you know 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already.*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Monday, 11 September, 2023 at 1:22:34 am 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+6 desal...@gmail.com wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi mhalidu, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the script you posted here seems much more 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extensive than you posted before: 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0e2880d9-64c0-4659-b497-902a5747caf4n%40googlegroups.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have been using your earlier script. It 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is magical. How is this one different from 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> earlier one?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for posting these scripts, by 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the way. It has saved my countless hours; by 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> running multiple fonts in one 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sweep. I was not able to find any instruction 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on how to train for  multiple 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fonts. The official manual is also unclear. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> YOUr script helped me to get 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> started. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, August 9, 2023 at 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 11:00:49 PM UTC+3 mdalihu...@gmail.com 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ok, I will try as you said.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one more thing, what's the role of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trained_text lines will be? I have seen 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bengali text are long words of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lines. so I wanna know how many words or 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> characters will be the better 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> choice for the train? and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--xsize=3600','--ysize=350',  will be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> according 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to words of lines?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thursday, 10 August, 2023 at 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1:10:14 am UTC+6 shree wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Include the default fonts also in your 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine-tuning list of fonts and see if that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helps.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 9, 2023, 2:27 PM Ali hussain 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <mdalihu...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have trained some new fonts by 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine-tune methods for the Bengali language 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in Tesseract 5 and I have used 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all official trained_text and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tessdata_best and other things also.  
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> everything is good but the problem is the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> default font which was trained 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before that does not convert text like 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prev but my new fonts work well. I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't understand why it's happening. I 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> share code based to understand what 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> going on.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *codes  for creating tif, gt.txt, .box 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files:*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import os
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import random
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import pathlib
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import argparse
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from FontList import FontList
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def read_line_count():
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     if os.path.exists('line_count.txt'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         with open('line_count.txt', 'r') 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as file:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             return int(file.read())
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     return 0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def write_line_count(line_count):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     with open('line_count.txt', 'w') as 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         file.write(str(line_count))
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def create_training_data(
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file, font_list, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory, start_line=None, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line=None):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     lines = []
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     with open(training_text_file, 'r') 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as input_file:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         for line in input_file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> readlines():
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             lines.append(line.strip())
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     if not os.path.exists(
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         os.mkdir(output_directory)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     random.shuffle(lines)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     if start_line is None:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         line_count = read_line_count() 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  # Set the starting line_count from 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the file
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     else:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         line_count = start_line
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     if end_line is None:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         end_line_count = len(lines) - 1 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  # Set the ending line_count
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     else:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         end_line_count = min(end_line, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> len(lines) - 1)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     for font in font_list.fonts:  # 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iterate through all the fonts in the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         font_serial = 1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         for line in lines:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             training_text_file_name = 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathlib.Path(training_text_file).stem
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             # Generate a unique serial 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> number for each line
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             line_serial = f"{line_count
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :d}"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             # GT (Ground Truth) text 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filename
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             line_gt_text = os.path.join
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (output_directory, f'{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name}_{line_serial}
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .gt.txt')
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             with open(line_gt_text, 'w') 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as output_file:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 output_file.writelines
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ([line])
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             # Image filename
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             file_base_name = f'ben_{
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line_serial}'  # Unique filename for 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> each font
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             subprocess.run([
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 'text2image',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 f'--font={font}',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 f'--text={line_gt_text}
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 f'--outputbase={
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory}/{file_base_name}',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--max_pages=1',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --strip_unrenderable_words',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--leading=36',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--xsize=3600',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--ysize=350',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--char_spacing=1.0',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '--exposure=0',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                 '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --unicharset_file=langdata/ben.unicharset
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             ])
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             line_count += 1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>             font_serial += 1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         # Reset font_serial for the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> next font iteration
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>         font_serial = 1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     write_line_count(line_count)  # 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Update the line_count in the file
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if __name__ == "__main__":
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     parser = argparse.ArgumentParser()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     parser.add_argument('--start', type
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> =int, help='Starting line count 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (inclusive)')
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     parser.add_argument('--end', type=
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> int, help='Ending line count 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (inclusive)')
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     args = parser.parse_args()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     training_text_file = '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> langdata/ben.training_text'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     output_directory = '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tesstrain/data/ben-ground-truth'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     # Create an instance of the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FontList class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     font_list = FontList()
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>      
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create_training_data(training_text_file, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, args.start, 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> args.end)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *and for training code:*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # List of font names
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_names = ['ben']
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for font in font_names:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     command = 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make training MODEL_NAME={font} 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TESSDATA=../tesseract/tessdata 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LANG_TYPE=Indic"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     subprocess.run(command, shell=True)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any suggestion to identify to extract 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the problem.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks, everyone
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You received this message because you 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are subscribed to the Google Groups 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "tesseract-ocr" group.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> receiving emails from it, send an email to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> visit 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>>> from it, send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d8c16644-b52a-426c-86a6-b1e797f3e5a2n%40googlegroups.com
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d8c16644-b52a-426c-86a6-b1e797f3e5a2n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb833902-7258-43e3-8854-d51ce26b7257n%40googlegroups.com
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb833902-7258-43e3-8854-d51ce26b7257n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2ff0315a-9630-4b9a-902a-fd6b71b2080dn%40googlegroups.com.

Re: [tesseract-ocr] accuracy problem after trained in fine-tune

Reply via email to