not good result. that's way i stop to training now. default traineddata is overall good then scratch. On Thursday, 19 October, 2023 at 11:32:08 pm UTC+6 desal...@gmail.com wrote:
> Hi Ali, > How is your training going? > Do you get good results with the training-from-the-scratch? > > On Friday, September 15, 2023 at 6:42:26 PM UTC+3 tesseract-ocr wrote: > >> yes, two months ago when I started to learn OCR I saw that. it was very >> helpful at the beginning. >> On Friday, 15 September, 2023 at 4:01:32 pm UTC+6 desal...@gmail.com >> wrote: >> >>> Just saw this paper: https://osf.io/b8h7q >>> >>> On Thursday, September 14, 2023 at 9:02:22 PM UTC+3 mdalihu...@gmail.com >>> wrote: >>> >>>> I will try some changes. thx >>>> >>>> On Thursday, 14 September, 2023 at 2:46:36 pm UTC+6 elvi...@gmail.com >>>> wrote: >>>> >>>>> I also faced that issue in the Windows. Apparently, the issue is >>>>> related with unicode. You can try your luck by changing "r" to "utf8" in >>>>> the script. >>>>> I end up installing Ubuntu because i was having too many errors in the >>>>> Windows. >>>>> >>>>> On Thu, Sep 14, 2023, 9:33 AM Ali hussain <mdalihu...@gmail.com> >>>>> wrote: >>>>> >>>>>> you faced this error, Can't encode transcription? if you faced how >>>>>> you have solved this? >>>>>> >>>>>> On Thursday, 14 September, 2023 at 10:51:52 am UTC+6 >>>>>> elvi...@gmail.com wrote: >>>>>> >>>>>>> I was using my own text >>>>>>> >>>>>>> On Thu, Sep 14, 2023, 6:58 AM Ali hussain <mdalihu...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> you are training from Tessearact default text data or your own >>>>>>>> collected text data? >>>>>>>> On Thursday, 14 September, 2023 at 12:19:53 am UTC+6 >>>>>>>> desal...@gmail.com wrote: >>>>>>>> >>>>>>>>> I now get to 200000 iterations; and the error rate is stuck at >>>>>>>>> 0.46. The result is absolutely trash: nowhere close to the >>>>>>>>> default/Ray's >>>>>>>>> training. >>>>>>>>> >>>>>>>>> On Wednesday, September 13, 2023 at 2:47:05 PM UTC+3 >>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> after Tesseact recognizes text from images. then you can apply >>>>>>>>>> regex to replace the wrong word with to correct word. >>>>>>>>>> I'm not familiar with paddleOcr and scanTailor also. >>>>>>>>>> >>>>>>>>>> On Wednesday, 13 September, 2023 at 5:06:12 pm UTC+6 >>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>> >>>>>>>>>>> At what stage are you doing the regex replacement? >>>>>>>>>>> My process has been: Scan (tif)--> ScanTailor --> Tesseract --> >>>>>>>>>>> pdf >>>>>>>>>>> >>>>>>>>>>> >EasyOCR I think is best for ID cards or something like that >>>>>>>>>>> image process. but document images like books, here Tesseract is >>>>>>>>>>> better >>>>>>>>>>> than EasyOCR. >>>>>>>>>>> >>>>>>>>>>> How about paddleOcr?, are you familiar with it? >>>>>>>>>>> >>>>>>>>>>> On Wednesday, September 13, 2023 at 1:45:54 PM UTC+3 >>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>> >>>>>>>>>>>> I know what you mean. but in some cases, it helps me. I have >>>>>>>>>>>> faced specific characters and words are always not recognized by >>>>>>>>>>>> Tesseract. >>>>>>>>>>>> That way I use these regex to replace those characters and words >>>>>>>>>>>> if >>>>>>>>>>>> those characters are incorrect. >>>>>>>>>>>> >>>>>>>>>>>> see what I have done: >>>>>>>>>>>> >>>>>>>>>>>> " ী": "ী", >>>>>>>>>>>> " ্": " ", >>>>>>>>>>>> " ে": " ", >>>>>>>>>>>> জ্া: "জা", >>>>>>>>>>>> " ": " ", >>>>>>>>>>>> " ": " ", >>>>>>>>>>>> " ": " ", >>>>>>>>>>>> "্প": " ", >>>>>>>>>>>> " য": "র্য", >>>>>>>>>>>> য: "য", >>>>>>>>>>>> " া": "া", >>>>>>>>>>>> আা: "আ", >>>>>>>>>>>> ম্ি: "মি", >>>>>>>>>>>> স্ু: "সু", >>>>>>>>>>>> "হূ ": "হূ", >>>>>>>>>>>> " ণ": "ণ", >>>>>>>>>>>> র্্: "র", >>>>>>>>>>>> "চিন্ত ": "চিন্তা ", >>>>>>>>>>>> ন্া: "না", >>>>>>>>>>>> "সম ূর্ন": "সম্পূর্ণ", >>>>>>>>>>>> On Wednesday, 13 September, 2023 at 4:18:22 pm UTC+6 >>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>> >>>>>>>>>>>>> The problem for regex is that Tesseract is not consistent in >>>>>>>>>>>>> its replacement. >>>>>>>>>>>>> Think of the original training of English data doesn't contain >>>>>>>>>>>>> the letter /u/. What does Tesseract do when it faces /u/ in >>>>>>>>>>>>> actual >>>>>>>>>>>>> processing?? >>>>>>>>>>>>> In some cases, it replaces it with closely similar letters >>>>>>>>>>>>> such as /v/ and /w/. In other cases, it completely removes it. >>>>>>>>>>>>> That is what >>>>>>>>>>>>> is happening with my case. Those characters re sometimes >>>>>>>>>>>>> completely >>>>>>>>>>>>> removed; other times, they are replaced by closely resembling >>>>>>>>>>>>> characters. >>>>>>>>>>>>> Because of this inconsistency, applying regex is very difficult. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wednesday, September 13, 2023 at 1:02:01 PM UTC+3 >>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> if Some specific characters or words are always missing >>>>>>>>>>>>>> from the OCR result. then you can apply logic with the Regular >>>>>>>>>>>>>> expressions >>>>>>>>>>>>>> method on your applications. After OCR, these specific >>>>>>>>>>>>>> characters or words >>>>>>>>>>>>>> will be replaced by current characters or words that you defined >>>>>>>>>>>>>> in your >>>>>>>>>>>>>> applications by Regular expressions. it can be done in some >>>>>>>>>>>>>> major problems. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 3:51:29 pm UTC+6 >>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The characters are getting missed, even after fine-tuning. >>>>>>>>>>>>>>> I never made any progress. I tried many different >>>>>>>>>>>>>>> ways. Some specific characters are always missing from the OCR >>>>>>>>>>>>>>> result. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 12:49:20 PM UTC+3 >>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> EasyOCR I think is best for ID cards or something like that >>>>>>>>>>>>>>>> image process. but document images like books, here Tesseract >>>>>>>>>>>>>>>> is better >>>>>>>>>>>>>>>> than EasyOCR. Even I didn't use EasyOCR. you can try it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have added words of dictionaries but the result is the >>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> what kind of problem you have faced in fine-tuning in few >>>>>>>>>>>>>>>> new characters as you said (*but, I failed in every >>>>>>>>>>>>>>>> possible way to introduce a few new characters into the >>>>>>>>>>>>>>>> database.)* >>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 3:33:48 pm UTC+6 >>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes, we are new to this. I find the instructions (the >>>>>>>>>>>>>>>>> manual) very hard to follow. The video you linked above was >>>>>>>>>>>>>>>>> really helpful >>>>>>>>>>>>>>>>> to get started. My plan at the beginning was to fine tune the >>>>>>>>>>>>>>>>> existing >>>>>>>>>>>>>>>>> .traineddata. But, I failed in every possible way to >>>>>>>>>>>>>>>>> introduce a few new >>>>>>>>>>>>>>>>> characters into the database. That is why I started from >>>>>>>>>>>>>>>>> scratch. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sure, I will follow Lorenzo's suggestion: will run more >>>>>>>>>>>>>>>>> the iterations, and see if I can improve. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Another areas we need to explore is usage of dictionaries >>>>>>>>>>>>>>>>> actually. May be adding millions of words into the dictionary >>>>>>>>>>>>>>>>> could help >>>>>>>>>>>>>>>>> Tesseract. I don't have millions of words; but I am looking >>>>>>>>>>>>>>>>> into some >>>>>>>>>>>>>>>>> corpus to get more words into the dictionary. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If this all fails, EasyOCR (and probably other similar >>>>>>>>>>>>>>>>> open-source packages) is probably our next option to try on. >>>>>>>>>>>>>>>>> Sure, sharing >>>>>>>>>>>>>>>>> our experiences will be helpful. I will let you know if I >>>>>>>>>>>>>>>>> made good >>>>>>>>>>>>>>>>> progresses in any of these options. >>>>>>>>>>>>>>>>> On Wednesday, September 13, 2023 at 12:19:48 PM UTC+3 >>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> How is your training going for Bengali? It was nearly >>>>>>>>>>>>>>>>>> good but I faced space problems between two words, some >>>>>>>>>>>>>>>>>> words are spaces >>>>>>>>>>>>>>>>>> but most of them have no space. I think is problem is in the >>>>>>>>>>>>>>>>>> dataset but I >>>>>>>>>>>>>>>>>> use the default training dataset from Tesseract which is >>>>>>>>>>>>>>>>>> used in Ben That >>>>>>>>>>>>>>>>>> way I am confused so I have to explore more. by the way, >>>>>>>>>>>>>>>>>> you can try as Lorenzo >>>>>>>>>>>>>>>>>> Blz said. Actually training from scratch is harder than >>>>>>>>>>>>>>>>>> fine-tuning. so you can use different datasets to explore. >>>>>>>>>>>>>>>>>> if you succeed. >>>>>>>>>>>>>>>>>> please let me know how you have done this whole process. >>>>>>>>>>>>>>>>>> I'm also new in >>>>>>>>>>>>>>>>>> this field. >>>>>>>>>>>>>>>>>> On Wednesday, 13 September, 2023 at 1:13:43 pm UTC+6 >>>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> How is your training going for Bengali? >>>>>>>>>>>>>>>>>>> I have been trying to train from scratch. I made about >>>>>>>>>>>>>>>>>>> 64,000 lines of text (which produced about 255,000 files, >>>>>>>>>>>>>>>>>>> in the end) and >>>>>>>>>>>>>>>>>>> run the training for 150,000 iterations; getting 0.51 >>>>>>>>>>>>>>>>>>> training error rate. >>>>>>>>>>>>>>>>>>> I was hopping to get reasonable accuracy. Unfortunately, >>>>>>>>>>>>>>>>>>> when I run the OCR >>>>>>>>>>>>>>>>>>> using .traineddata, the accuracy is absolutely terrible. >>>>>>>>>>>>>>>>>>> Do you think I >>>>>>>>>>>>>>>>>>> made some mistakes, or that is an expected result? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tuesday, September 12, 2023 at 11:15:25 PM UTC+3 >>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes, he doesn't mention all fonts but only one font. >>>>>>>>>>>>>>>>>>>> That way he didn't use *MODEL_NAME in a separate * >>>>>>>>>>>>>>>>>>>> *script **file script I think.* >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Actually, here we teach all *tif, gt.txt, and .box >>>>>>>>>>>>>>>>>>>> files *which are created by *MODEL_NAME I mean **eng, >>>>>>>>>>>>>>>>>>>> ben, oro flag or language code *because when we first >>>>>>>>>>>>>>>>>>>> create *tif, gt.txt, and .box files, *every file >>>>>>>>>>>>>>>>>>>> starts by *MODEL_NAME*. This *MODEL_NAME* we >>>>>>>>>>>>>>>>>>>> selected on the training script for looping each tif, >>>>>>>>>>>>>>>>>>>> gt.txt, and .box >>>>>>>>>>>>>>>>>>>> files which are created by *MODEL_NAME.* >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tuesday, 12 September, 2023 at 9:42:13 pm UTC+6 >>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Yes, I am familiar with the video and have set up the >>>>>>>>>>>>>>>>>>>>> folder structure as you did. Indeed, I have tried a >>>>>>>>>>>>>>>>>>>>> number of fine-tuning >>>>>>>>>>>>>>>>>>>>> with a single font following Gracia's video. But, your >>>>>>>>>>>>>>>>>>>>> script is much >>>>>>>>>>>>>>>>>>>>> better because supports multiple fonts. The whole >>>>>>>>>>>>>>>>>>>>> improvement you made is >>>>>>>>>>>>>>>>>>>>> brilliant; and very useful. It is all working for me. >>>>>>>>>>>>>>>>>>>>> The only part that I didn't understand is the trick >>>>>>>>>>>>>>>>>>>>> you used in your tesseract_train.py script. You see, I >>>>>>>>>>>>>>>>>>>>> have been doing >>>>>>>>>>>>>>>>>>>>> exactly to you did except this script. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The scripts seems to have the trick of >>>>>>>>>>>>>>>>>>>>> sending/teaching each of the fonts (iteratively) into the >>>>>>>>>>>>>>>>>>>>> model. The script >>>>>>>>>>>>>>>>>>>>> I have been using (which I get from Garcia) doesn't >>>>>>>>>>>>>>>>>>>>> mention font at all. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> *TESSDATA_PREFIX=../tesseract/tessdata make training >>>>>>>>>>>>>>>>>>>>> MODEL_NAME=oro TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000* >>>>>>>>>>>>>>>>>>>>> Does it mean that my model does't train the fonts >>>>>>>>>>>>>>>>>>>>> (even if the fonts have been included in the splitting >>>>>>>>>>>>>>>>>>>>> process, in the >>>>>>>>>>>>>>>>>>>>> other script)? >>>>>>>>>>>>>>>>>>>>> On Monday, September 11, 2023 at 10:54:08 AM UTC+3 >>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> *import subprocess# List of font namesfont_names = >>>>>>>>>>>>>>>>>>>>>> ['ben']for font in font_names: command = >>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata make training >>>>>>>>>>>>>>>>>>>>>> MODEL_NAME={font} >>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000"* >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> * subprocess.run(command, shell=True) 1 . This >>>>>>>>>>>>>>>>>>>>>> command is for training data that I have named '* >>>>>>>>>>>>>>>>>>>>>> tesseract_training*.py' inside tesstrain folder.* >>>>>>>>>>>>>>>>>>>>>> *2. root directory means your main training folder >>>>>>>>>>>>>>>>>>>>>> and inside it as like langdata, tessearact, tesstrain >>>>>>>>>>>>>>>>>>>>>> folders. if you see >>>>>>>>>>>>>>>>>>>>>> this tutorial * >>>>>>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=KE4xEzFGSU8 you >>>>>>>>>>>>>>>>>>>>>> will understand better the folder structure. only I >>>>>>>>>>>>>>>>>>>>>> created tesseract_training.py in tesstrain folder for >>>>>>>>>>>>>>>>>>>>>> training and >>>>>>>>>>>>>>>>>>>>>> FontList.py file is the main path as *like langdata, >>>>>>>>>>>>>>>>>>>>>> tessearact, tesstrain, and *split_training_text.py. >>>>>>>>>>>>>>>>>>>>>> 3. first of all you have to put all fonts in your >>>>>>>>>>>>>>>>>>>>>> Linux fonts folder. /usr/share/fonts/ then run: >>>>>>>>>>>>>>>>>>>>>> sudo apt update then sudo fc-cache -fv >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> after that, you have to add the exact font's name in >>>>>>>>>>>>>>>>>>>>>> FontList.py file like me. >>>>>>>>>>>>>>>>>>>>>> I have added two pic my folder structure. first is >>>>>>>>>>>>>>>>>>>>>> main structure pic and the second is the Colopse >>>>>>>>>>>>>>>>>>>>>> tesstrain folder. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I[image: Screenshot 2023-09-11 134947.png][image: >>>>>>>>>>>>>>>>>>>>>> Screenshot 2023-09-11 135014.png] >>>>>>>>>>>>>>>>>>>>>> On Monday, 11 September, 2023 at 12:50:03 pm UTC+6 >>>>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thank you so much for putting out these brilliant >>>>>>>>>>>>>>>>>>>>>>> scripts. They make the process much more efficient. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I have one more question on the other script that >>>>>>>>>>>>>>>>>>>>>>> you use to train. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> *import subprocess# List of font namesfont_names = >>>>>>>>>>>>>>>>>>>>>>> ['ben']for font in font_names: command = >>>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata make training >>>>>>>>>>>>>>>>>>>>>>> MODEL_NAME={font} >>>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000"* >>>>>>>>>>>>>>>>>>>>>>> * subprocess.run(command, shell=True) * >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Do you have the name of fonts listed in file in the >>>>>>>>>>>>>>>>>>>>>>> same/root directory? >>>>>>>>>>>>>>>>>>>>>>> How do you setup the names of the fonts in the file, >>>>>>>>>>>>>>>>>>>>>>> if you don't mind sharing it? >>>>>>>>>>>>>>>>>>>>>>> On Monday, September 11, 2023 at 4:27:27 AM UTC+3 >>>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> You can use the new script below. it's better than >>>>>>>>>>>>>>>>>>>>>>>> the previous two scripts. You can create *tif, >>>>>>>>>>>>>>>>>>>>>>>> gt.txt, and .box files *by multiple fonts and also >>>>>>>>>>>>>>>>>>>>>>>> use breakpoint if vs code close or anything during >>>>>>>>>>>>>>>>>>>>>>>> creating *tif, >>>>>>>>>>>>>>>>>>>>>>>> gt.txt, and .box files *then you can checkpoint to >>>>>>>>>>>>>>>>>>>>>>>> navigate where you close vs code. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> command for *tif, gt.txt, and .box files * >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> import os >>>>>>>>>>>>>>>>>>>>>>>> import random >>>>>>>>>>>>>>>>>>>>>>>> import pathlib >>>>>>>>>>>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>>>>>>>>>>> import argparse >>>>>>>>>>>>>>>>>>>>>>>> from FontList import FontList >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> def create_training_data(training_text_file, >>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, start_line=None, >>>>>>>>>>>>>>>>>>>>>>>> end_line=None): >>>>>>>>>>>>>>>>>>>>>>>> lines = [] >>>>>>>>>>>>>>>>>>>>>>>> with open(training_text_file, 'r') as >>>>>>>>>>>>>>>>>>>>>>>> input_file: >>>>>>>>>>>>>>>>>>>>>>>> lines = input_file.readlines() >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> if not os.path.exists(output_directory): >>>>>>>>>>>>>>>>>>>>>>>> os.mkdir(output_directory) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> if start_line is None: >>>>>>>>>>>>>>>>>>>>>>>> start_line = 0 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> if end_line is None: >>>>>>>>>>>>>>>>>>>>>>>> end_line = len(lines) - 1 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> for font_name in font_list.fonts: >>>>>>>>>>>>>>>>>>>>>>>> for line_index in range(start_line, >>>>>>>>>>>>>>>>>>>>>>>> end_line + 1): >>>>>>>>>>>>>>>>>>>>>>>> line = lines[line_index].strip() >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name = pathlib.Path( >>>>>>>>>>>>>>>>>>>>>>>> training_text_file).stem >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> line_serial = f"{line_index:d}" >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> line_gt_text = os.path.join( >>>>>>>>>>>>>>>>>>>>>>>> output_directory, f'{training_text_file_name}_{ >>>>>>>>>>>>>>>>>>>>>>>> line_serial}_{font_name.replace(" ", "_")}.gt.txt') >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> with open(line_gt_text, 'w') as >>>>>>>>>>>>>>>>>>>>>>>> output_file: >>>>>>>>>>>>>>>>>>>>>>>> output_file.writelines([line]) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> file_base_name = f'{ >>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name}_{line_serial}_{ >>>>>>>>>>>>>>>>>>>>>>>> font_name.replace(" ", "_")}' >>>>>>>>>>>>>>>>>>>>>>>> subprocess.run([ >>>>>>>>>>>>>>>>>>>>>>>> 'text2image', >>>>>>>>>>>>>>>>>>>>>>>> f'--font={font_name}', >>>>>>>>>>>>>>>>>>>>>>>> f'--text={line_gt_text}', >>>>>>>>>>>>>>>>>>>>>>>> f'--outputbase={output_directory}/{ >>>>>>>>>>>>>>>>>>>>>>>> file_base_name}', >>>>>>>>>>>>>>>>>>>>>>>> '--max_pages=1', >>>>>>>>>>>>>>>>>>>>>>>> '--strip_unrenderable_words', >>>>>>>>>>>>>>>>>>>>>>>> '--leading=36', >>>>>>>>>>>>>>>>>>>>>>>> '--xsize=3600', >>>>>>>>>>>>>>>>>>>>>>>> '--ysize=330', >>>>>>>>>>>>>>>>>>>>>>>> '--char_spacing=1.0', >>>>>>>>>>>>>>>>>>>>>>>> '--exposure=0', >>>>>>>>>>>>>>>>>>>>>>>> ' >>>>>>>>>>>>>>>>>>>>>>>> --unicharset_file=langdata/eng.unicharset', >>>>>>>>>>>>>>>>>>>>>>>> ]) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> if __name__ == "__main__": >>>>>>>>>>>>>>>>>>>>>>>> parser = argparse.ArgumentParser() >>>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--start', type=int, >>>>>>>>>>>>>>>>>>>>>>>> help='Starting >>>>>>>>>>>>>>>>>>>>>>>> line count (inclusive)') >>>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--end', type=int, >>>>>>>>>>>>>>>>>>>>>>>> help='Ending >>>>>>>>>>>>>>>>>>>>>>>> line count (inclusive)') >>>>>>>>>>>>>>>>>>>>>>>> args = parser.parse_args() >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> training_text_file = ' >>>>>>>>>>>>>>>>>>>>>>>> langdata/eng.training_text' >>>>>>>>>>>>>>>>>>>>>>>> output_directory = ' >>>>>>>>>>>>>>>>>>>>>>>> tesstrain/data/eng-ground-truth' >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> font_list = FontList() >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> create_training_data(training_text_file, >>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, args.start, args.end) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Then create a file called "FontList" in the root >>>>>>>>>>>>>>>>>>>>>>>> directory and paste it. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> class FontList: >>>>>>>>>>>>>>>>>>>>>>>> def __init__(self): >>>>>>>>>>>>>>>>>>>>>>>> self.fonts = [ >>>>>>>>>>>>>>>>>>>>>>>> "Gerlick" >>>>>>>>>>>>>>>>>>>>>>>> "Sagar Medium", >>>>>>>>>>>>>>>>>>>>>>>> "Ekushey Lohit Normal", >>>>>>>>>>>>>>>>>>>>>>>> "Charukola Round Head Regular, >>>>>>>>>>>>>>>>>>>>>>>> weight=433", >>>>>>>>>>>>>>>>>>>>>>>> "Charukola Round Head Bold, weight=443" >>>>>>>>>>>>>>>>>>>>>>>> , >>>>>>>>>>>>>>>>>>>>>>>> "Ador Orjoma Unicode", >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> then import in the above code, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> *for breakpoint command:* >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> sudo python3 split_training_text.py --start 0 --end >>>>>>>>>>>>>>>>>>>>>>>> 11 >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> change checkpoint according to you --start 0 --end >>>>>>>>>>>>>>>>>>>>>>>> 11. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> *and training checkpoint as you know already.* >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Monday, 11 September, 2023 at 1:22:34 am UTC+6 >>>>>>>>>>>>>>>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi mhalidu, >>>>>>>>>>>>>>>>>>>>>>>>> the script you posted here seems much more >>>>>>>>>>>>>>>>>>>>>>>>> extensive than you posted before: >>>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0e2880d9-64c0-4659-b497-902a5747caf4n%40googlegroups.com >>>>>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I have been using your earlier script. It is >>>>>>>>>>>>>>>>>>>>>>>>> magical. How is this one different from the >>>>>>>>>>>>>>>>>>>>>>>>> earlier one? >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thank you for posting these scripts, by the way. >>>>>>>>>>>>>>>>>>>>>>>>> It has saved my countless hours; by running multiple >>>>>>>>>>>>>>>>>>>>>>>>> fonts in one sweep. I >>>>>>>>>>>>>>>>>>>>>>>>> was not able to find any instruction on how to train >>>>>>>>>>>>>>>>>>>>>>>>> for multiple fonts. >>>>>>>>>>>>>>>>>>>>>>>>> The official manual is also unclear. YOUr script >>>>>>>>>>>>>>>>>>>>>>>>> helped me to get started. >>>>>>>>>>>>>>>>>>>>>>>>> On Wednesday, August 9, 2023 at 11:00:49 PM UTC+3 >>>>>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> ok, I will try as you said. >>>>>>>>>>>>>>>>>>>>>>>>>> one more thing, what's the role of the >>>>>>>>>>>>>>>>>>>>>>>>>> trained_text lines will be? I have seen Bengali text >>>>>>>>>>>>>>>>>>>>>>>>>> are long words of >>>>>>>>>>>>>>>>>>>>>>>>>> lines. so I wanna know how many words or characters >>>>>>>>>>>>>>>>>>>>>>>>>> will be the better >>>>>>>>>>>>>>>>>>>>>>>>>> choice for the train? and >>>>>>>>>>>>>>>>>>>>>>>>>> '--xsize=3600','--ysize=350', will be according >>>>>>>>>>>>>>>>>>>>>>>>>> to words of lines? >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thursday, 10 August, 2023 at 1:10:14 am UTC+6 >>>>>>>>>>>>>>>>>>>>>>>>>> shree wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Include the default fonts also in your >>>>>>>>>>>>>>>>>>>>>>>>>>> fine-tuning list of fonts and see if that helps. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 9, 2023, 2:27 PM Ali hussain < >>>>>>>>>>>>>>>>>>>>>>>>>>> mdalihu...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I have trained some new fonts by fine-tune >>>>>>>>>>>>>>>>>>>>>>>>>>>> methods for the Bengali language in Tesseract 5 >>>>>>>>>>>>>>>>>>>>>>>>>>>> and I have used all >>>>>>>>>>>>>>>>>>>>>>>>>>>> official trained_text and tessdata_best and other >>>>>>>>>>>>>>>>>>>>>>>>>>>> things also. everything >>>>>>>>>>>>>>>>>>>>>>>>>>>> is good but the problem is the default font which >>>>>>>>>>>>>>>>>>>>>>>>>>>> was trained before that >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not convert text like prev but my new fonts >>>>>>>>>>>>>>>>>>>>>>>>>>>> work well. I don't >>>>>>>>>>>>>>>>>>>>>>>>>>>> understand why it's happening. I share code based >>>>>>>>>>>>>>>>>>>>>>>>>>>> to understand what going >>>>>>>>>>>>>>>>>>>>>>>>>>>> on. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> *codes for creating tif, gt.txt, .box files:* >>>>>>>>>>>>>>>>>>>>>>>>>>>> import os >>>>>>>>>>>>>>>>>>>>>>>>>>>> import random >>>>>>>>>>>>>>>>>>>>>>>>>>>> import pathlib >>>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>>>>>>>>>>>>>>> import argparse >>>>>>>>>>>>>>>>>>>>>>>>>>>> from FontList import FontList >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> def read_line_count(): >>>>>>>>>>>>>>>>>>>>>>>>>>>> if os.path.exists('line_count.txt'): >>>>>>>>>>>>>>>>>>>>>>>>>>>> with open('line_count.txt', 'r') as >>>>>>>>>>>>>>>>>>>>>>>>>>>> file: >>>>>>>>>>>>>>>>>>>>>>>>>>>> return int(file.read()) >>>>>>>>>>>>>>>>>>>>>>>>>>>> return 0 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> def write_line_count(line_count): >>>>>>>>>>>>>>>>>>>>>>>>>>>> with open('line_count.txt', 'w') as file: >>>>>>>>>>>>>>>>>>>>>>>>>>>> file.write(str(line_count)) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> def create_training_data(training_text_file, >>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, start_line=None, >>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line=None): >>>>>>>>>>>>>>>>>>>>>>>>>>>> lines = [] >>>>>>>>>>>>>>>>>>>>>>>>>>>> with open(training_text_file, 'r') as >>>>>>>>>>>>>>>>>>>>>>>>>>>> input_file: >>>>>>>>>>>>>>>>>>>>>>>>>>>> for line in input_file.readlines(): >>>>>>>>>>>>>>>>>>>>>>>>>>>> lines.append(line.strip()) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> if not os.path.exists(output_directory): >>>>>>>>>>>>>>>>>>>>>>>>>>>> os.mkdir(output_directory) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> random.shuffle(lines) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> if start_line is None: >>>>>>>>>>>>>>>>>>>>>>>>>>>> line_count = read_line_count() # Set >>>>>>>>>>>>>>>>>>>>>>>>>>>> the starting line_count from the file >>>>>>>>>>>>>>>>>>>>>>>>>>>> else: >>>>>>>>>>>>>>>>>>>>>>>>>>>> line_count = start_line >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> if end_line is None: >>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line_count = len(lines) - 1 # Set >>>>>>>>>>>>>>>>>>>>>>>>>>>> the ending line_count >>>>>>>>>>>>>>>>>>>>>>>>>>>> else: >>>>>>>>>>>>>>>>>>>>>>>>>>>> end_line_count = min(end_line, len(lines) >>>>>>>>>>>>>>>>>>>>>>>>>>>> - 1) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for font in font_list.fonts: # Iterate >>>>>>>>>>>>>>>>>>>>>>>>>>>> through all the fonts in the font_list >>>>>>>>>>>>>>>>>>>>>>>>>>>> font_serial = 1 >>>>>>>>>>>>>>>>>>>>>>>>>>>> for line in lines: >>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file_name = pathlib. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Path(training_text_file).stem >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> # Generate a unique serial number >>>>>>>>>>>>>>>>>>>>>>>>>>>> for each line >>>>>>>>>>>>>>>>>>>>>>>>>>>> line_serial = f"{line_count:d}" >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> # GT (Ground Truth) text filename >>>>>>>>>>>>>>>>>>>>>>>>>>>> line_gt_text = os.path.join( >>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory, f'{training_text_file_name}_{ >>>>>>>>>>>>>>>>>>>>>>>>>>>> line_serial}.gt.txt') >>>>>>>>>>>>>>>>>>>>>>>>>>>> with open(line_gt_text, 'w') as >>>>>>>>>>>>>>>>>>>>>>>>>>>> output_file: >>>>>>>>>>>>>>>>>>>>>>>>>>>> output_file.writelines([line]) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> # Image filename >>>>>>>>>>>>>>>>>>>>>>>>>>>> file_base_name = f'ben_{line_serial >>>>>>>>>>>>>>>>>>>>>>>>>>>> }' # Unique filename for each font >>>>>>>>>>>>>>>>>>>>>>>>>>>> subprocess.run([ >>>>>>>>>>>>>>>>>>>>>>>>>>>> 'text2image', >>>>>>>>>>>>>>>>>>>>>>>>>>>> f'--font={font}', >>>>>>>>>>>>>>>>>>>>>>>>>>>> f'--text={line_gt_text}', >>>>>>>>>>>>>>>>>>>>>>>>>>>> f'--outputbase={ >>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory}/{file_base_name}', >>>>>>>>>>>>>>>>>>>>>>>>>>>> '--max_pages=1', >>>>>>>>>>>>>>>>>>>>>>>>>>>> '--strip_unrenderable_words', >>>>>>>>>>>>>>>>>>>>>>>>>>>> '--leading=36', >>>>>>>>>>>>>>>>>>>>>>>>>>>> '--xsize=3600', >>>>>>>>>>>>>>>>>>>>>>>>>>>> '--ysize=350', >>>>>>>>>>>>>>>>>>>>>>>>>>>> '--char_spacing=1.0', >>>>>>>>>>>>>>>>>>>>>>>>>>>> '--exposure=0', >>>>>>>>>>>>>>>>>>>>>>>>>>>> ' >>>>>>>>>>>>>>>>>>>>>>>>>>>> --unicharset_file=langdata/ben.unicharset', >>>>>>>>>>>>>>>>>>>>>>>>>>>> ]) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> line_count += 1 >>>>>>>>>>>>>>>>>>>>>>>>>>>> font_serial += 1 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> # Reset font_serial for the next font >>>>>>>>>>>>>>>>>>>>>>>>>>>> iteration >>>>>>>>>>>>>>>>>>>>>>>>>>>> font_serial = 1 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> write_line_count(line_count) # Update the >>>>>>>>>>>>>>>>>>>>>>>>>>>> line_count in the file >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> if __name__ == "__main__": >>>>>>>>>>>>>>>>>>>>>>>>>>>> parser = argparse.ArgumentParser() >>>>>>>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--start', type=int, >>>>>>>>>>>>>>>>>>>>>>>>>>>> help='Starting line count (inclusive)') >>>>>>>>>>>>>>>>>>>>>>>>>>>> parser.add_argument('--end', type=int, help >>>>>>>>>>>>>>>>>>>>>>>>>>>> ='Ending line count (inclusive)') >>>>>>>>>>>>>>>>>>>>>>>>>>>> args = parser.parse_args() >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> training_text_file = ' >>>>>>>>>>>>>>>>>>>>>>>>>>>> langdata/ben.training_text' >>>>>>>>>>>>>>>>>>>>>>>>>>>> output_directory = ' >>>>>>>>>>>>>>>>>>>>>>>>>>>> tesstrain/data/ben-ground-truth' >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> # Create an instance of the FontList class >>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list = FontList() >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> create_training_data(training_text_file, >>>>>>>>>>>>>>>>>>>>>>>>>>>> font_list, output_directory, args.start, args.end) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> *and for training code:* >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> # List of font names >>>>>>>>>>>>>>>>>>>>>>>>>>>> font_names = ['ben'] >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> for font in font_names: >>>>>>>>>>>>>>>>>>>>>>>>>>>> command = >>>>>>>>>>>>>>>>>>>>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>>>>>>>> make training MODEL_NAME={font} >>>>>>>>>>>>>>>>>>>>>>>>>>>> START_MODEL=ben TESSDATA=../tesseract/tessdata >>>>>>>>>>>>>>>>>>>>>>>>>>>> MAX_ITERATIONS=10000 >>>>>>>>>>>>>>>>>>>>>>>>>>>> LANG_TYPE=Indic" >>>>>>>>>>>>>>>>>>>>>>>>>>>> subprocess.run(command, shell=True) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> any suggestion to identify to extract the >>>>>>>>>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks, everyone >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>>> You received this message because you are >>>>>>>>>>>>>>>>>>>>>>>>>>>> subscribed to the Google Groups "tesseract-ocr" >>>>>>>>>>>>>>>>>>>>>>>>>>>> group. >>>>>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop >>>>>>>>>>>>>>>>>>>>>>>>>>>> receiving emails from it, send an email to >>>>>>>>>>>>>>>>>>>>>>>>>>>> tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "tesseract-ocr" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>> >>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d8c16644-b52a-426c-86a6-b1e797f3e5a2n%40googlegroups.com >>>>>>>> >>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d8c16644-b52a-426c-86a6-b1e797f3e5a2n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>> >>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eb833902-7258-43e3-8854-d51ce26b7257n%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eb833902-7258-43e3-8854-d51ce26b7257n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/10994725-a949-4916-a57e-884c2cd88c58n%40googlegroups.com.