thx. i will try with this method as soon as possible. On Sunday, 22 October, 2023 at 3:49:46 pm UTC+6 desal...@gmail.com wrote:
> here it is: > https://github.com/tesseract-ocr/tessdoc/blob/main/Data-Files-in-tessdata_best.md > > On Sunday, October 22, 2023 at 12:45:40 PM UTC+3 Des Bw wrote: > >> This is the code I used to train from a layer: >> *make training MODEL_NAME=amh START_MODEL=amh APPEND_INDEX=5 >> NET_SPEC='[Lfx256 O1c105]' TESSDATA=../tesseract/tessdata EPOCHS=3 >> TARGET_ERROR_RATE=0.0001 training >> data/amh.log &* >> *- I took it from Scheer' training *tesstrain-JSTORArabic*: * >> https://github.com/Shreeshrii/tesstrain-JSTORArabic >> >> - The net_spec of ben might not be the same to amh. Shreeshrii has sent a >> link on the netspecs of languages, in this forum. >> >> On Sunday, October 22, 2023 at 12:09:25 PM UTC+3 mdalihu...@gmail.com >> wrote: >> >>> you can test by changes '--char spacing=1.0 . i think it would be >>> problem accuracy of result on it also. >>> On Sunday, 22 October, 2023 at 3:07:16 pm UTC+6 Ali hussain wrote: >>> >>>> i haven't tried by cut the top layer of the network. you can share your >>>> knowledge what you done by cut the top layer of the network. or github >>>> project link. >>>> On Sunday, 22 October, 2023 at 12:27:32 pm UTC+6 desal...@gmail.com >>>> wrote: >>>> >>>>> That is massive data. Have you tried to train by cut the top layer of >>>>> the network? >>>>> I think that is the most promising approach. I was getting really good >>>>> results with that. But, the result is not getting translated to scanned >>>>> documents. I get best results with the syntethic data. I am no >>>>> experimenting with the settings in text2image if it is possible to >>>>> emulate >>>>> the scanned documents. >>>>> I am also suspecting this setting '--char_spacing=1.0', in our setup >>>>> is causing more trouble. Scanned documents come with characters spacing >>>>> close to zero.If you are planning to train more, try removing this >>>>> parameter. >>>>> >>>>> On Sunday, October 22, 2023 at 4:09:46 AM UTC+3 mdalihu...@gmail.com >>>>> wrote: >>>>> >>>>>> 600000 lines of text and the itarations higher then 600000. but some >>>>>> time i got better result in lower itarations in finetune like 100000 >>>>>> lines >>>>>> of text and itaration is only 5000 to10000. >>>>>> On Saturday, 21 October, 2023 at 11:37:13 am UTC+6 desal...@gmail.com >>>>>> wrote: >>>>>> >>>>>>> How many lines of text and iterations did you use? >>>>>>> >>>>>>> On Saturday, October 21, 2023 at 8:36:38 AM UTC+3 Des Bw wrote: >>>>>>> >>>>>>>> Yah, that is what I am getting as well. I was able to add the >>>>>>>> missing letter. But, the overall accuracy become lower than the >>>>>>>> default >>>>>>>> model. >>>>>>>> >>>>>>>> On Saturday, October 21, 2023 at 3:22:44 AM UTC+3 >>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>> >>>>>>>>> not good result. that's way i stop to training now. default >>>>>>>>> traineddata is overall good then scratch. >>>>>>>>> On Thursday, 19 October, 2023 at 11:32:08 pm UTC+6 >>>>>>>>> desal...@gmail.com wrote: >>>>>>>>> >>>>>>>>>> Hi Ali, >>>>>>>>>> How is your training going? >>>>>>>>>> Do you get good results with the training-from-the-scratch? >>>>>>>>>> >>>>>>>>>> On Friday, September 15, 2023 at 6:42:26 PM UTC+3 tesseract-ocr >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> yes, two months ago when I started to learn OCR I saw that. it >>>>>>>>>>> was very helpful at the beginning. >>>>>>>>>>> On Friday, 15 September, 2023 at 4:01:32 pm UTC+6 >>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>> >>>>>>>>>>>> Just saw this paper: https://osf.io/b8h7q >>>>>>>>>>>> >>>>>>>>>>>> On Thursday, September 14, 2023 at 9:02:22 PM UTC+3 >>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I will try some changes. thx >>>>>>>>>>>>> >>>>>>>>>>>>> On Thursday, 14 September, 2023 at 2:46:36 pm UTC+6 >>>>>>>>>>>>> elvi...@gmail.com wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I also faced that issue in the Windows. Apparently, the issue >>>>>>>>>>>>>> is related with unicode. You can try your luck by changing "r" >>>>>>>>>>>>>> to "utf8" >>>>>>>>>>>>>> in the script. >>>>>>>>>>>>>> I end up installing Ubuntu because i was having too many >>>>>>>>>>>>>> errors in the Windows. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Sep 14, 2023, 9:33 AM Ali hussain < >>>>>>>>>>>>>> mdalihu...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> you faced this error, Can't encode transcription? if you >>>>>>>>>>>>>>> faced how you have solved this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thursday, 14 September, 2023 at 10:51:52 am UTC+6 >>>>>>>>>>>>>>> elvi...@gmail.com wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I was using my own text >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Sep 14, 2023, 6:58 AM Ali hussain < >>>>>>>>>>>>>>>> mdalihu...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> you are training from Tessearact default text data or your >>>>>>>>>>>>>>>>> own collected text data? >>>>>>>>>>>>>>>>> On Thursday, 14 September, 2023 at 12:19:53 am UTC+6 >>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I now get to 200000 iterations; and the error rate is >>>>>>>>>>>>>>>>>> stuck at 0.46. The result is absolutely trash: nowhere close >>>>>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>>>>> default/Ray's training. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 2:47:05 PM UTC+3 >>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> after Tesseact recognizes text from images. then you can >>>>>>>>>>>>>>>>>>> apply regex to replace the wrong word with to correct word. >>>>>>>>>>>>>>>>>>> I'm not familiar with paddleOcr and scanTailor also. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 5:06:12 pm UTC+6 >>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> At what stage are you doing the regex replacement? >>>>>>>>>>>>>>>>>>>> My process has been: Scan (tif)--> ScanTailor --> >>>>>>>>>>>>>>>>>>>> Tesseract --> pdf >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >EasyOCR I think is best for ID cards or something like >>>>>>>>>>>>>>>>>>>> that image process. but document images like books, here >>>>>>>>>>>>>>>>>>>> Tesseract is >>>>>>>>>>>>>>>>>>>> better than EasyOCR. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> How about paddleOcr?, are you familiar with it? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 1:45:54 PM UTC+3 >>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I know what you mean. but in some cases, it helps me. >>>>>>>>>>>>>>>>>>>>> I have faced specific characters and words are always not >>>>>>>>>>>>>>>>>>>>> recognized by >>>>>>>>>>>>>>>>>>>>> Tesseract. That way I use these regex to replace those >>>>>>>>>>>>>>>>>>>>> characters and >>>>>>>>>>>>>>>>>>>>> words if those characters are incorrect. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> see what I have done: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> " ী": "ী", >>>>>>>>>>>>>>>>>>>>> " ্": " ", >>>>>>>>>>>>>>>>>>>>> " ে": " ", >>>>>>>>>>>>>>>>>>>>> জ্া: "জা", >>>>>>>>>>>>>>>>>>>>> " ": " ", >>>>>>>>>>>>>>>>>>>>> " ": " ", >>>>>>>>>>>>>>>>>>>>> " ": " ", >>>>>>>>>>>>>>>>>>>>> "্প": " ", >>>>>>>>>>>>>>>>>>>>> " য": "র্য", >>>>>>>>>>>>>>>>>>>>> য: "য", >>>>>>>>>>>>>>>>>>>>> " া": "া", >>>>>>>>>>>>>>>>>>>>> আা: "আ", >>>>>>>>>>>>>>>>>>>>> ম্ি: "মি", >>>>>>>>>>>>>>>>>>>>> স্ু: "সু", >>>>>>>>>>>>>>>>>>>>> "হূ ": "হূ", >>>>>>>>>>>>>>>>>>>>> " ণ": "ণ", >>>>>>>>>>>>>>>>>>>>> র্্: "র", >>>>>>>>>>>>>>>>>>>>> "চিন্ত ": "চিন্তা ", >>>>>>>>>>>>>>>>>>>>> ন্া: "না", >>>>>>>>>>>>>>>>>>>>> "সম ূর্ন": "সম্পূর্ণ", >>>>>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 4:18:22 pm UTC+6 >>>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The problem for regex is that Tesseract is not >>>>>>>>>>>>>>>>>>>>>> consistent in its replacement. >>>>>>>>>>>>>>>>>>>>>> Think of the original training of English data >>>>>>>>>>>>>>>>>>>>>> doesn't contain the letter /u/. What does Tesseract do >>>>>>>>>>>>>>>>>>>>>> when it faces /u/ in >>>>>>>>>>>>>>>>>>>>>> actual processing?? >>>>>>>>>>>>>>>>>>>>>> In some cases, it replaces it with closely similar >>>>>>>>>>>>>>>>>>>>>> letters such as /v/ and /w/. In other cases, it >>>>>>>>>>>>>>>>>>>>>> completely removes it. That >>>>>>>>>>>>>>>>>>>>>> is what is happening with my case. Those characters re >>>>>>>>>>>>>>>>>>>>>> sometimes completely >>>>>>>>>>>>>>>>>>>>>> removed; other times, they are replaced by closely >>>>>>>>>>>>>>>>>>>>>> resembling characters. >>>>>>>>>>>>>>>>>>>>>> Because of this inconsistency, applying regex is very >>>>>>>>>>>>>>>>>>>>>> difficult. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 1:02:01 PM UTC+3 >>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> if Some specific characters or words are always >>>>>>>>>>>>>>>>>>>>>>> missing from the OCR result. then you can apply logic >>>>>>>>>>>>>>>>>>>>>>> with the Regular >>>>>>>>>>>>>>>>>>>>>>> expressions method on your applications. After OCR, >>>>>>>>>>>>>>>>>>>>>>> these specific >>>>>>>>>>>>>>>>>>>>>>> characters or words will be replaced by current >>>>>>>>>>>>>>>>>>>>>>> characters or words that >>>>>>>>>>>>>>>>>>>>>>> you defined in your applications by Regular >>>>>>>>>>>>>>>>>>>>>>> expressions. it can be done in >>>>>>>>>>>>>>>>>>>>>>> some major problems. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 3:51:29 pm UTC+6 >>>>>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> The characters are getting missed, even after >>>>>>>>>>>>>>>>>>>>>>>> fine-tuning. >>>>>>>>>>>>>>>>>>>>>>>> I never made any progress. I tried many different >>>>>>>>>>>>>>>>>>>>>>>> ways. Some specific characters are always missing >>>>>>>>>>>>>>>>>>>>>>>> from the OCR result. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 12:49:20 PM >>>>>>>>>>>>>>>>>>>>>>>> UTC+3 mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> EasyOCR I think is best for ID cards or something >>>>>>>>>>>>>>>>>>>>>>>>> like that image process. but document images like >>>>>>>>>>>>>>>>>>>>>>>>> books, here Tesseract is >>>>>>>>>>>>>>>>>>>>>>>>> better than EasyOCR. Even I didn't use EasyOCR. you >>>>>>>>>>>>>>>>>>>>>>>>> can try it. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I have added words of dictionaries but the result >>>>>>>>>>>>>>>>>>>>>>>>> is the same. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> what kind of problem you have faced in fine-tuning >>>>>>>>>>>>>>>>>>>>>>>>> in few new characters as you said (*but, I failed >>>>>>>>>>>>>>>>>>>>>>>>> in every possible way to introduce a few new >>>>>>>>>>>>>>>>>>>>>>>>> characters into the database.)* >>>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 3:33:48 pm >>>>>>>>>>>>>>>>>>>>>>>>> UTC+6 desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Yes, we are new to this. I find the instructions >>>>>>>>>>>>>>>>>>>>>>>>>> (the manual) very hard to follow. The video you >>>>>>>>>>>>>>>>>>>>>>>>>> linked above was really >>>>>>>>>>>>>>>>>>>>>>>>>> helpful to get started. My plan at the beginning >>>>>>>>>>>>>>>>>>>>>>>>>> was to fine tune the >>>>>>>>>>>>>>>>>>>>>>>>>> existing .traineddata. But, I failed in every >>>>>>>>>>>>>>>>>>>>>>>>>> possible way to introduce a >>>>>>>>>>>>>>>>>>>>>>>>>> few new characters into the database. That is why I >>>>>>>>>>>>>>>>>>>>>>>>>> started from scratch. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Sure, I will follow Lorenzo's suggestion: will >>>>>>>>>>>>>>>>>>>>>>>>>> run more the iterations, and see if I can improve. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Another areas we need to explore is usage of >>>>>>>>>>>>>>>>>>>>>>>>>> dictionaries actually. May be adding millions of >>>>>>>>>>>>>>>>>>>>>>>>>> words into the >>>>>>>>>>>>>>>>>>>>>>>>>> dictionary could help Tesseract. I don't have >>>>>>>>>>>>>>>>>>>>>>>>>> millions of words; but I am >>>>>>>>>>>>>>>>>>>>>>>>>> looking into some corpus to get more words into the >>>>>>>>>>>>>>>>>>>>>>>>>> dictionary. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> If this all fails, EasyOCR (and probably other >>>>>>>>>>>>>>>>>>>>>>>>>> similar open-source packages) is probably our next >>>>>>>>>>>>>>>>>>>>>>>>>> option to try on. Sure, >>>>>>>>>>>>>>>>>>>>>>>>>> sharing our experiences will be helpful. I will let >>>>>>>>>>>>>>>>>>>>>>>>>> you know if I made good >>>>>>>>>>>>>>>>>>>>>>>>>> progresses in any of these options. >>>>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 12:19:48 PM >>>>>>>>>>>>>>>>>>>>>>>>>> UTC+3 mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> How is your training going for Bengali? It was >>>>>>>>>>>>>>>>>>>>>>>>>>> nearly good but I faced space problems between two >>>>>>>>>>>>>>>>>>>>>>>>>>> words, some words are >>>>>>>>>>>>>>>>>>>>>>>>>>> spaces but most of them have no space. I think is >>>>>>>>>>>>>>>>>>>>>>>>>>> problem is in the dataset >>>>>>>>>>>>>>>>>>>>>>>>>>> but I use the default training dataset from >>>>>>>>>>>>>>>>>>>>>>>>>>> Tesseract which is used in Ben >>>>>>>>>>>>>>>>>>>>>>>>>>> That way I am confused so I have to explore more. >>>>>>>>>>>>>>>>>>>>>>>>>>> by the way, you can try >>>>>>>>>>>>>>>>>>>>>>>>>>> as Lorenzo Blz said. Actually training from >>>>>>>>>>>>>>>>>>>>>>>>>>> scratch is harder than fine-tuning. so you can use >>>>>>>>>>>>>>>>>>>>>>>>>>> different datasets to >>>>>>>>>>>>>>>>>>>>>>>>>>> explore. if you succeed. please let me know how you >>>>>>>>>>>>>>>>>>>>>>>>>>> have done this whole >>>>>>>>>>>>>>>>>>>>>>>>>>> process. I'm also new in this field. >>>>>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 1:13:43 pm >>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+6 desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> How is your training going for Bengali? >>>>>>>>>>>>>>>>>>>>>>>>>>>> I have been trying to train from scratch. I >>>>>>>>>>>>>>>>>>>>>>>>>>>> made about 64,000 lines of text (which produced >>>>>>>>>>>>>>>>>>>>>>>>>>>> about 255,000 files, in the >>>>>>>>>>>>>>>>>>>>>>>>>>>> end) and run the training for 150,000 iterations; >>>>>>>>>>>>>>>>>>>>>>>>>>>> getting 0.51 training >>>>>>>>>>>>>>>>>>>>>>>>>>>> error rate. I was hopping to get reasonable >>>>>>>>>>>>>>>>>>>>>>>>>>>> accuracy. Unfortunately, when I >>>>>>>>>>>>>>>>>>>>>>>>>>>> run the OCR using .traineddata, the accuracy is >>>>>>>>>>>>>>>>>>>>>>>>>>>> absolutely terrible. Do >>>>>>>>>>>>>>>>>>>>>>>>>>>> you think I made some mistakes, or that is an >>>>>>>>>>>>>>>>>>>>>>>>>>>> expected result? >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tuesday, September 12, 2023 at 11:15:25 PM >>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+3 mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, he doesn't mention all fonts but only one >>>>>>>>>>>>>>>>>>>>>>>>>>>>> font. That way he didn't use *MODEL_NAME in >>>>>>>>>>>>>>>>>>>>>>>>>>>>> a separate **script **file script I think.* >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, here we teach all *tif, gt.txt, and >>>>>>>>>>>>>>>>>>>>>>>>>>>>> .box files *which are created by *MODEL_NAME >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I mean **eng, ben, oro flag or language code >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *because >>>>>>>>>>>>>>>>>>>>>>>>>>>>> when we first create *tif, gt.txt, and .box >>>>>>>>>>>>>>>>>>>>>>>>>>>>> files, *every file starts by *MODEL_NAME*. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> This *MODEL_NAME* we selected on the >>>>>>>>>>>>>>>>>>>>>>>>>>>>> training script for looping each tif, gt.txt, and >>>>>>>>>>>>>>>>>>>>>>>>>>>>> .box files which are >>>>>>>>>>>>>>>>>>>>>>>>>>>>> created by *MODEL_NAME.* >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tuesday, 12 September, 2023 at 9:42:13 pm >>>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+6 desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I am familiar with the video and have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set up the folder structure as you did. Indeed, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried a number of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine-tuning with a single font following >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Gracia's video. But, your script >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is much better because supports multiple fonts. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The whole improvement you >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> made is brilliant; and very useful. It is all >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working for me. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The only part that I didn't understand is the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trick you used in your tesseract_train.py >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> script. You see, I have been >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doing exactly to you did except this script. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The scripts seems to have the trick of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sending/teaching each of the fonts (iteratively) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into the model. The script >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have been using (which I get from Garcia) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doesn't mention font at all. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *TESSDATA_PREFIX=../tesseract/tessdata make >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training MODEL_NAME=oro >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does it mean that my model does't train the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fonts (even if the fonts have been included in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the splitting process, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the other script)? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Monday, September 11, 2023 at 10:54:08 AM >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+3 mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *import subprocess# List of font >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> namesfont_names = ['ben']for font in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_names: command = >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata make >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training MODEL_NAME={font} >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000"* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * subprocess.run(command, shell=True) 1 . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This command is for training data that I have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> named '* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tesseract_training*.py' inside tesstrain >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folder.* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *2. root directory means your main training >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folder and inside it as like langdata, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tessearact, tesstrain folders. if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you see this tutorial * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=KE4xEzFGSU8 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you will understand better the folder >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure. only I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> created tesseract_training.py in tesstrain >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folder for training and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FontList.py file is the main path as *like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> langdata, tessearact, tesstrain, and * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> split_training_text.py. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. first of all you have to put all fonts in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> your Linux fonts folder. /usr/share/fonts/ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then run: sudo apt update then sudo >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fc-cache -fv >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after that, you have to add the exact font's >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> name in FontList.py file like me. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have added two pic my folder structure. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first is main structure pic and the second is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the Colopse tesstrain folder. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I[image: Screenshot 2023-09-11 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 134947.png][image: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Screenshot 2023-09-11 135014.png] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Monday, 11 September, 2023 at 12:50:03 pm >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+6 desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you so much for putting out these >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> brilliant scripts. They make the process much >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more efficient. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have one more question on the other >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> script that you use to train. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *import subprocess# List of font >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> namesfont_names = ['ben']for font in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_names: command = >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata make >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training MODEL_NAME={font} >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000"* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * subprocess.run(command, shell=True) * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do you have the name of fonts listed in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file in the same/root directory? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you setup the names of the fonts in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the file, if you don't mind sharing it? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Monday, September 11, 2023 at 4:27:27 AM >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UTC+3 mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You can use the new script below. it's >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> better than the previous two scripts. You >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can create *tif, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gt.txt, and .box files *by multiple fonts >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and also use breakpoint if vs code close or >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anything during creating *tif, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gt.txt, and .box files *then you can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> checkpoint to navigate where you close vs >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command for *tif, gt.txt, and .box files * >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import os >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import random >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import pathlib >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import argparse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from FontList import FontList >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def create_training_data( >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file, font_list, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory, start_line=None, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line=None): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lines = [] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with open(training_text_file, 'r') as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> input_file: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lines = input_file.readlines() >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if not os.path.exists(output_directory >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> os.mkdir(output_directory) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if start_line is None: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start_line = 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if end_line is None: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line = len(lines) - 1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for font_name in font_list.fonts: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for line_index in range(start_line, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line + 1): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line = lines[line_index].strip >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> () >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name = >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathlib.Path(training_text_file).stem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line_serial = f"{line_index:d} >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> " >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line_gt_text = os.path.join( >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory, f'{ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name}_{line_serial}_{ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_name.replace(" ", "_")}.gt.txt') >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with open(line_gt_text, 'w') >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as output_file: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_file.writelines >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ([line]) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file_base_name = f'{ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name}_{line_serial}_{ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_name.replace(" ", "_")}' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subprocess.run([ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'text2image', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f'--font={font_name}', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f'--text={line_gt_text}', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f'--outputbase={ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory}/{file_base_name}', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--max_pages=1', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --strip_unrenderable_words', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--leading=36', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--xsize=3600', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--ysize=330', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--char_spacing=1.0', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--exposure=0', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --unicharset_file=langdata/eng.unicharset' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ]) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if __name__ == "__main__": >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parser = argparse.ArgumentParser() >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--start', type= >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> int, help='Starting line count (inclusive) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ') >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--end', type=int, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help='Ending line count (inclusive)') >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> args = parser.parse_args() >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file = ' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> langdata/eng.training_text' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory = ' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tesstrain/data/eng-ground-truth' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list = FontList() >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create_training_data(training_text_file, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, args.start, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> args.end) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then create a file called "FontList" in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the root directory and paste it. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class FontList: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def __init__(self): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> self.fonts = [ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Gerlick" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Sagar Medium", >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Ekushey Lohit Normal", >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Charukola Round Head Regular, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> weight=433", >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Charukola Round Head Bold, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> weight=443", >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Ador Orjoma Unicode", >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then import in the above code, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *for breakpoint command:* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sudo python3 split_training_text.py --start >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 0 --end 11 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change checkpoint according to you >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --start 0 --end 11. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *and training checkpoint as you know >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already.* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Monday, 11 September, 2023 at >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1:22:34 am UTC+6 desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi mhalidu, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the script you posted here seems much >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more extensive than you posted before: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0e2880d9-64c0-4659-b497-902a5747caf4n%40googlegroups.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have been using your earlier script. It >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is magical. How is this one different from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> earlier one? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for posting these scripts, by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the way. It has saved my countless hours; by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> running multiple fonts in one >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sweep. I was not able to find any >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instruction on how to train for multiple >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fonts. The official manual is also unclear. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> YOUr script helped me to get >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> started. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, August 9, 2023 at >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 11:00:49 PM UTC+3 mdalihu...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ok, I will try as you said. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> one more thing, what's the role of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> trained_text lines will be? I have seen >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bengali text are long words of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lines. so I wanna know how many words or >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> characters will be the better >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> choice for the train? and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--xsize=3600','--ysize=350', will be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> according >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to words of lines? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thursday, 10 August, 2023 at >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1:10:14 am UTC+6 shree wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Include the default fonts also in your >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine-tuning list of fonts and see if that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helps. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 9, 2023, 2:27 PM Ali >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hussain <mdalihu...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have trained some new fonts by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine-tune methods for the Bengali >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> language in Tesseract 5 and I have used >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all official trained_text and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tessdata_best and other things also. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> everything is good but the problem is the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> default font which was trained >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> before that does not convert text like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prev but my new fonts work well. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't understand why it's happening. I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> share code based to understand what >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> going on. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *codes for creating tif, gt.txt, .box >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files:* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import os >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import random >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import pathlib >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import argparse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from FontList import FontList >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def read_line_count(): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if os.path.exists('line_count.txt' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with open('line_count.txt', 'r >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ') as file: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return int(file.read()) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> return 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def write_line_count(line_count): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with open('line_count.txt', 'w') >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as file: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file.write(str(line_count)) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> def create_training_data( >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file, font_list, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory, start_line=None, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line=None): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lines = [] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with open(training_text_file, 'r') >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as input_file: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for line in input_file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> readlines(): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lines.append(line.strip()) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if not os.path.exists( >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory): >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> os.mkdir(output_directory) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> random.shuffle(lines) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if start_line is None: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line_count = read_line_count() >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # Set the starting line_count from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> else: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line_count = start_line >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if end_line is None: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line_count = len(lines) - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1 # Set the ending line_count >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> else: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line_count = min(end_line, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> len(lines) - 1) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for font in font_list.fonts: # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Iterate through all the fonts in the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_serial = 1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for line in lines: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name = >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathlib.Path(training_text_file).stem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # Generate a unique >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> serial number for each line >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line_serial = f"{ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line_count:d}" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # GT (Ground Truth) text >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filename >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line_gt_text = os.path. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join(output_directory, f'{ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name}_{line_serial} >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> .gt.txt') >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with open(line_gt_text, 'w >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ') as output_file: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_file.writelines >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ([line]) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # Image filename >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file_base_name = f'ben_{ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line_serial}' # Unique filename for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> each font >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subprocess.run([ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'text2image', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f'--font={font}', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f'--text={line_gt_text >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> }', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f'--outputbase={ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory}/{file_base_name}', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--max_pages=1', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --strip_unrenderable_words', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--leading=36', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--xsize=3600', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--ysize=350', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--char_spacing=1.0', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '--exposure=0', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --unicharset_file=langdata/ben.unicharset >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ]) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> line_count += 1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_serial += 1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # Reset font_serial for the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> next font iteration >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_serial = 1 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> write_line_count(line_count) # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Update the line_count in the file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if __name__ == "__main__": >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parser = argparse.ArgumentParser() >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--start', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> type=int, help='Starting line count >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (inclusive)') >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--end', type= >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> int, help='Ending line count >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (inclusive)') >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> args = parser.parse_args() >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file = ' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> langdata/ben.training_text' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory = ' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tesstrain/data/ben-ground-truth' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # Create an instance of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FontList class >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list = FontList() >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> create_training_data(training_text_file, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, args.start, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> args.end) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *and for training code:* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # List of font names >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> font_names = ['ben'] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for font in font_names: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> command = >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make training MODEL_NAME={font} >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LANG_TYPE=Indic" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subprocess.run(command, shell=True >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any suggestion to identify to extract >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the problem. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks, everyone >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You received this message because you >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are subscribed to the Google Groups >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "tesseract-ocr" group. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop receiving emails from it, send an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> email to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> visit >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> You received this message because you are subscribed to >>>>>>>>>>>>>>>>> the Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>>>> from it, send an email to tesseract-oc...@googlegroups.com >>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d8c16644-b52a-426c-86a6-b1e797f3e5a2n%40googlegroups.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d8c16644-b52a-426c-86a6-b1e797f3e5a2n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>> from it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb833902-7258-43e3-8854-d51ce26b7257n%40googlegroups.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb833902-7258-43e3-8854-d51ce26b7257n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>> . >>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/52329855-62f3-4419-9ec9-838a573fd4a0n%40googlegroups.com.