The problem for regex is that Tesseract is not consistent in its replacement. Think of the original training of English data doesn't contain the letter /u/. What does Tesseract do when it faces /u/ in actual processing?? In some cases, it replaces it with closely similar letters such as /v/ and /w/. In other cases, it completely removes it. That is what is happening with my case. Those characters re sometimes completely removed; other times, they are replaced by closely resembling characters. Because of this inconsistency, applying regex is very difficult.
On Wednesday, September 13, 2023 at 1:02:01 PM UTC+3 mdalihu...@gmail.com wrote: > if Some specific characters or words are always missing from the OCR > result. then you can apply logic with the Regular expressions method on > your applications. After OCR, these specific characters or words will be > replaced by current characters or words that you defined in your > applications by Regular expressions. it can be done in some major problems. > > On Wednesday, 13 September, 2023 at 3:51:29 pm UTC+6 desal...@gmail.com > wrote: > >> The characters are getting missed, even after fine-tuning. >> I never made any progress. I tried many different ways. Some specific >> characters are always missing from the OCR result. >> >> On Wednesday, September 13, 2023 at 12:49:20 PM UTC+3 >> mdalihu...@gmail.com wrote: >> >>> EasyOCR I think is best for ID cards or something like that image >>> process. but document images like books, here Tesseract is better than >>> EasyOCR. Even I didn't use EasyOCR. you can try it. >>> >>> I have added words of dictionaries but the result is the same. >>> >>> what kind of problem you have faced in fine-tuning in few new characters >>> as you said (*but, I failed in every possible way to introduce a few >>> new characters into the database.)* >>> On Wednesday, 13 September, 2023 at 3:33:48 pm UTC+6 desal...@gmail.com >>> wrote: >>> >>>> Yes, we are new to this. I find the instructions (the manual) very hard >>>> to follow. The video you linked above was really helpful to get started. >>>> My plan at the beginning was to fine tune the existing .traineddata. But, >>>> I >>>> failed in every possible way to introduce a few new characters into the >>>> database. That is why I started from scratch. >>>> >>>> Sure, I will follow Lorenzo's suggestion: will run more the iterations, >>>> and see if I can improve. >>>> >>>> Another areas we need to explore is usage of dictionaries actually. May >>>> be adding millions of words into the dictionary could help Tesseract. I >>>> don't have millions of words; but I am looking into some corpus to get >>>> more >>>> words into the dictionary. >>>> >>>> If this all fails, EasyOCR (and probably other similar open-source >>>> packages) is probably our next option to try on. Sure, sharing >>>> our experiences will be helpful. I will let you know if I made good >>>> progresses in any of these options. >>>> On Wednesday, September 13, 2023 at 12:19:48 PM UTC+3 >>>> mdalihu...@gmail.com wrote: >>>> >>>>> How is your training going for Bengali? It was nearly good but I >>>>> faced space problems between two words, some words are spaces but most of >>>>> them have no space. I think is problem is in the dataset but I use the >>>>> default training dataset from Tesseract which is used in Ben That way I >>>>> am >>>>> confused so I have to explore more. by the way, you can try as Lorenzo >>>>> Blz said. Actually training from scratch is harder than fine-tuning. >>>>> so you can use different datasets to explore. if you succeed. please let >>>>> me >>>>> know how you have done this whole process. I'm also new in this field. >>>>> On Wednesday, 13 September, 2023 at 1:13:43 pm UTC+6 >>>>> desal...@gmail.com wrote: >>>>> >>>>>> How is your training going for Bengali? >>>>>> I have been trying to train from scratch. I made about 64,000 lines >>>>>> of text (which produced about 255,000 files, in the end) and run the >>>>>> training for 150,000 iterations; getting 0.51 training error rate. I was >>>>>> hopping to get reasonable accuracy. Unfortunately, when I run the OCR >>>>>> using .traineddata, the accuracy is absolutely terrible. Do you think >>>>>> I >>>>>> made some mistakes, or that is an expected result? >>>>>> >>>>>> On Tuesday, September 12, 2023 at 11:15:25 PM UTC+3 >>>>>> mdalihu...@gmail.com wrote: >>>>>> >>>>>>> Yes, he doesn't mention all fonts but only one font. That way he >>>>>>> didn't use *MODEL_NAME in a separate **script **file script I >>>>>>> think.* >>>>>>> >>>>>>> Actually, here we teach all *tif, gt.txt, and .box files *which are >>>>>>> created by *MODEL_NAME I mean **eng, ben, oro flag or language >>>>>>> code *because when we first create *tif, gt.txt, and .box files, *every >>>>>>> file starts by *MODEL_NAME*. This *MODEL_NAME* we selected on >>>>>>> the training script for looping each tif, gt.txt, and .box files which >>>>>>> are >>>>>>> created by *MODEL_NAME.* >>>>>>> >>>>>>> On Tuesday, 12 September, 2023 at 9:42:13 pm UTC+6 >>>>>>> desal...@gmail.com wrote: >>>>>>> >>>>>>>> Yes, I am familiar with the video and have set up the folder >>>>>>>> structure as you did. Indeed, I have tried a number of fine-tuning >>>>>>>> with a >>>>>>>> single font following Gracia's video. But, your script is much better >>>>>>>> because supports multiple fonts. The whole improvement you made is >>>>>>>> brilliant; and very useful. It is all working for me. >>>>>>>> The only part that I didn't understand is the trick you used in >>>>>>>> your tesseract_train.py script. You see, I have been doing exactly to >>>>>>>> you >>>>>>>> did except this script. >>>>>>>> >>>>>>>> The scripts seems to have the trick of sending/teaching each of the >>>>>>>> fonts (iteratively) into the model. The script I have been using >>>>>>>> (which I >>>>>>>> get from Garcia) doesn't mention font at all. >>>>>>>> >>>>>>>> *TESSDATA_PREFIX=../tesseract/tessdata make training MODEL_NAME=oro >>>>>>>> TESSDATA=../tesseract/tessdata MAX_ITERATIONS=10000* >>>>>>>> Does it mean that my model does't train the fonts (even if the >>>>>>>> fonts have been included in the splitting process, in the other >>>>>>>> script)? >>>>>>>> On Monday, September 11, 2023 at 10:54:08 AM UTC+3 >>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *import subprocess# List of font namesfont_names = ['ben']for font >>>>>>>>> in font_names: command = f"TESSDATA_PREFIX=../tesseract/tessdata >>>>>>>>> make >>>>>>>>> training MODEL_NAME={font} START_MODEL=ben >>>>>>>>> TESSDATA=../tesseract/tessdata >>>>>>>>> MAX_ITERATIONS=10000"* >>>>>>>>> >>>>>>>>> >>>>>>>>> * subprocess.run(command, shell=True) 1 . This command is for >>>>>>>>> training data that I have named '*tesseract_training*.py' inside >>>>>>>>> tesstrain folder.* >>>>>>>>> *2. root directory means your main training folder and inside it >>>>>>>>> as like langdata, tessearact, tesstrain folders. if you see this >>>>>>>>> tutorial >>>>>>>>> *https://www.youtube.com/watch?v=KE4xEzFGSU8 you will >>>>>>>>> understand better the folder structure. only I >>>>>>>>> created tesseract_training.py in tesstrain folder for training and >>>>>>>>> FontList.py file is the main path as *like langdata, tessearact, >>>>>>>>> tesstrain, and *split_training_text.py. >>>>>>>>> 3. first of all you have to put all fonts in your Linux fonts >>>>>>>>> folder. /usr/share/fonts/ then run: sudo apt update then sudo >>>>>>>>> fc-cache -fv >>>>>>>>> >>>>>>>>> after that, you have to add the exact font's name in FontList.py >>>>>>>>> file like me. >>>>>>>>> I have added two pic my folder structure. first is main >>>>>>>>> structure pic and the second is the Colopse tesstrain folder. >>>>>>>>> >>>>>>>>> I[image: Screenshot 2023-09-11 134947.png][image: Screenshot >>>>>>>>> 2023-09-11 135014.png] >>>>>>>>> On Monday, 11 September, 2023 at 12:50:03 pm UTC+6 >>>>>>>>> desal...@gmail.com wrote: >>>>>>>>> >>>>>>>>>> Thank you so much for putting out these brilliant scripts. They >>>>>>>>>> make the process much more efficient. >>>>>>>>>> >>>>>>>>>> I have one more question on the other script that you use to >>>>>>>>>> train. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *import subprocess# List of font namesfont_names = ['ben']for >>>>>>>>>> font in font_names: command = >>>>>>>>>> f"TESSDATA_PREFIX=../tesseract/tessdata >>>>>>>>>> make training MODEL_NAME={font} START_MODEL=ben >>>>>>>>>> TESSDATA=../tesseract/tessdata MAX_ITERATIONS=10000"* >>>>>>>>>> * subprocess.run(command, shell=True) * >>>>>>>>>> >>>>>>>>>> Do you have the name of fonts listed in file in the same/root >>>>>>>>>> directory? >>>>>>>>>> How do you setup the names of the fonts in the file, if you don't >>>>>>>>>> mind sharing it? >>>>>>>>>> On Monday, September 11, 2023 at 4:27:27 AM UTC+3 >>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>> >>>>>>>>>>> You can use the new script below. it's better than the previous >>>>>>>>>>> two scripts. You can create *tif, gt.txt, and .box files *by >>>>>>>>>>> multiple fonts and also use breakpoint if vs code close or anything >>>>>>>>>>> during >>>>>>>>>>> creating *tif, gt.txt, and .box files *then you can checkpoint >>>>>>>>>>> to navigate where you close vs code. >>>>>>>>>>> >>>>>>>>>>> command for *tif, gt.txt, and .box files * >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> import os >>>>>>>>>>> import random >>>>>>>>>>> import pathlib >>>>>>>>>>> import subprocess >>>>>>>>>>> import argparse >>>>>>>>>>> from FontList import FontList >>>>>>>>>>> >>>>>>>>>>> def create_training_data(training_text_file, font_list, >>>>>>>>>>> output_directory, start_line=None, end_line=None): >>>>>>>>>>> lines = [] >>>>>>>>>>> with open(training_text_file, 'r') as input_file: >>>>>>>>>>> lines = input_file.readlines() >>>>>>>>>>> >>>>>>>>>>> if not os.path.exists(output_directory): >>>>>>>>>>> os.mkdir(output_directory) >>>>>>>>>>> >>>>>>>>>>> if start_line is None: >>>>>>>>>>> start_line = 0 >>>>>>>>>>> >>>>>>>>>>> if end_line is None: >>>>>>>>>>> end_line = len(lines) - 1 >>>>>>>>>>> >>>>>>>>>>> for font_name in font_list.fonts: >>>>>>>>>>> for line_index in range(start_line, end_line + 1): >>>>>>>>>>> line = lines[line_index].strip() >>>>>>>>>>> >>>>>>>>>>> training_text_file_name = pathlib.Path( >>>>>>>>>>> training_text_file).stem >>>>>>>>>>> >>>>>>>>>>> line_serial = f"{line_index:d}" >>>>>>>>>>> >>>>>>>>>>> line_gt_text = os.path.join(output_directory, f'{ >>>>>>>>>>> training_text_file_name}_{line_serial}_{font_name.replace(" ", " >>>>>>>>>>> _")}.gt.txt') >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> with open(line_gt_text, 'w') as output_file: >>>>>>>>>>> output_file.writelines([line]) >>>>>>>>>>> >>>>>>>>>>> file_base_name = f'{training_text_file_name}_{ >>>>>>>>>>> line_serial}_{font_name.replace(" ", "_")}' >>>>>>>>>>> subprocess.run([ >>>>>>>>>>> 'text2image', >>>>>>>>>>> f'--font={font_name}', >>>>>>>>>>> f'--text={line_gt_text}', >>>>>>>>>>> f'--outputbase={output_directory}/{ >>>>>>>>>>> file_base_name}', >>>>>>>>>>> '--max_pages=1', >>>>>>>>>>> '--strip_unrenderable_words', >>>>>>>>>>> '--leading=36', >>>>>>>>>>> '--xsize=3600', >>>>>>>>>>> '--ysize=330', >>>>>>>>>>> '--char_spacing=1.0', >>>>>>>>>>> '--exposure=0', >>>>>>>>>>> '--unicharset_file=langdata/eng.unicharset', >>>>>>>>>>> ]) >>>>>>>>>>> >>>>>>>>>>> if __name__ == "__main__": >>>>>>>>>>> parser = argparse.ArgumentParser() >>>>>>>>>>> parser.add_argument('--start', type=int, help='Starting >>>>>>>>>>> line count (inclusive)') >>>>>>>>>>> parser.add_argument('--end', type=int, help='Ending line >>>>>>>>>>> count (inclusive)') >>>>>>>>>>> args = parser.parse_args() >>>>>>>>>>> >>>>>>>>>>> training_text_file = 'langdata/eng.training_text' >>>>>>>>>>> output_directory = 'tesstrain/data/eng-ground-truth' >>>>>>>>>>> >>>>>>>>>>> font_list = FontList() >>>>>>>>>>> >>>>>>>>>>> create_training_data(training_text_file, font_list, >>>>>>>>>>> output_directory, args.start, args.end) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Then create a file called "FontList" in the root directory and >>>>>>>>>>> paste it. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> class FontList: >>>>>>>>>>> def __init__(self): >>>>>>>>>>> self.fonts = [ >>>>>>>>>>> "Gerlick" >>>>>>>>>>> "Sagar Medium", >>>>>>>>>>> "Ekushey Lohit Normal", >>>>>>>>>>> "Charukola Round Head Regular, weight=433", >>>>>>>>>>> "Charukola Round Head Bold, weight=443", >>>>>>>>>>> "Ador Orjoma Unicode", >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> then import in the above code, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *for breakpoint command:* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> sudo python3 split_training_text.py --start 0 --end 11 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> change checkpoint according to you --start 0 --end 11. >>>>>>>>>>> >>>>>>>>>>> *and training checkpoint as you know already.* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Monday, 11 September, 2023 at 1:22:34 am UTC+6 >>>>>>>>>>> desal...@gmail.com wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi mhalidu, >>>>>>>>>>>> the script you posted here seems much more extensive than you >>>>>>>>>>>> posted before: >>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0e2880d9-64c0-4659-b497-902a5747caf4n%40googlegroups.com >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> I have been using your earlier script. It is magical. How is >>>>>>>>>>>> this one different from the earlier one? >>>>>>>>>>>> >>>>>>>>>>>> Thank you for posting these scripts, by the way. It has saved >>>>>>>>>>>> my countless hours; by running multiple fonts in one sweep. I was >>>>>>>>>>>> not able >>>>>>>>>>>> to find any instruction on how to train for multiple fonts. The >>>>>>>>>>>> official >>>>>>>>>>>> manual is also unclear. YOUr script helped me to get started. >>>>>>>>>>>> On Wednesday, August 9, 2023 at 11:00:49 PM UTC+3 >>>>>>>>>>>> mdalihu...@gmail.com wrote: >>>>>>>>>>>> >>>>>>>>>>>>> ok, I will try as you said. >>>>>>>>>>>>> one more thing, what's the role of the trained_text lines will >>>>>>>>>>>>> be? I have seen Bengali text are long words of lines. so I wanna >>>>>>>>>>>>> know how >>>>>>>>>>>>> many words or characters will be the better choice for the train? >>>>>>>>>>>>> and '--xsize=3600','--ysize=350', will be according to words of >>>>>>>>>>>>> lines? >>>>>>>>>>>>> >>>>>>>>>>>>> On Thursday, 10 August, 2023 at 1:10:14 am UTC+6 shree wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Include the default fonts also in your fine-tuning list of >>>>>>>>>>>>>> fonts and see if that helps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Aug 9, 2023, 2:27 PM Ali hussain < >>>>>>>>>>>>>> mdalihu...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have trained some new fonts by fine-tune methods for the >>>>>>>>>>>>>>> Bengali language in Tesseract 5 and I have used all official >>>>>>>>>>>>>>> trained_text >>>>>>>>>>>>>>> and tessdata_best and other things also. everything is good >>>>>>>>>>>>>>> but the >>>>>>>>>>>>>>> problem is the default font which was trained before that does >>>>>>>>>>>>>>> not convert >>>>>>>>>>>>>>> text like prev but my new fonts work well. I don't understand >>>>>>>>>>>>>>> why it's >>>>>>>>>>>>>>> happening. I share code based to understand what going on. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *codes for creating tif, gt.txt, .box files:* >>>>>>>>>>>>>>> import os >>>>>>>>>>>>>>> import random >>>>>>>>>>>>>>> import pathlib >>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>> import argparse >>>>>>>>>>>>>>> from FontList import FontList >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def read_line_count(): >>>>>>>>>>>>>>> if os.path.exists('line_count.txt'): >>>>>>>>>>>>>>> with open('line_count.txt', 'r') as file: >>>>>>>>>>>>>>> return int(file.read()) >>>>>>>>>>>>>>> return 0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def write_line_count(line_count): >>>>>>>>>>>>>>> with open('line_count.txt', 'w') as file: >>>>>>>>>>>>>>> file.write(str(line_count)) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> def create_training_data(training_text_file, font_list, >>>>>>>>>>>>>>> output_directory, start_line=None, end_line=None): >>>>>>>>>>>>>>> lines = [] >>>>>>>>>>>>>>> with open(training_text_file, 'r') as input_file: >>>>>>>>>>>>>>> for line in input_file.readlines(): >>>>>>>>>>>>>>> lines.append(line.strip()) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if not os.path.exists(output_directory): >>>>>>>>>>>>>>> os.mkdir(output_directory) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> random.shuffle(lines) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if start_line is None: >>>>>>>>>>>>>>> line_count = read_line_count() # Set the starting >>>>>>>>>>>>>>> line_count from the file >>>>>>>>>>>>>>> else: >>>>>>>>>>>>>>> line_count = start_line >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if end_line is None: >>>>>>>>>>>>>>> end_line_count = len(lines) - 1 # Set the ending >>>>>>>>>>>>>>> line_count >>>>>>>>>>>>>>> else: >>>>>>>>>>>>>>> end_line_count = min(end_line, len(lines) - 1) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> for font in font_list.fonts: # Iterate through all the >>>>>>>>>>>>>>> fonts in the font_list >>>>>>>>>>>>>>> font_serial = 1 >>>>>>>>>>>>>>> for line in lines: >>>>>>>>>>>>>>> training_text_file_name = pathlib.Path( >>>>>>>>>>>>>>> training_text_file).stem >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # Generate a unique serial number for each line >>>>>>>>>>>>>>> line_serial = f"{line_count:d}" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # GT (Ground Truth) text filename >>>>>>>>>>>>>>> line_gt_text = os.path.join(output_directory, f' >>>>>>>>>>>>>>> {training_text_file_name}_{line_serial}.gt.txt') >>>>>>>>>>>>>>> with open(line_gt_text, 'w') as output_file: >>>>>>>>>>>>>>> output_file.writelines([line]) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # Image filename >>>>>>>>>>>>>>> file_base_name = f'ben_{line_serial}' # Unique >>>>>>>>>>>>>>> filename for each font >>>>>>>>>>>>>>> subprocess.run([ >>>>>>>>>>>>>>> 'text2image', >>>>>>>>>>>>>>> f'--font={font}', >>>>>>>>>>>>>>> f'--text={line_gt_text}', >>>>>>>>>>>>>>> f'--outputbase={output_directory}/{ >>>>>>>>>>>>>>> file_base_name}', >>>>>>>>>>>>>>> '--max_pages=1', >>>>>>>>>>>>>>> '--strip_unrenderable_words', >>>>>>>>>>>>>>> '--leading=36', >>>>>>>>>>>>>>> '--xsize=3600', >>>>>>>>>>>>>>> '--ysize=350', >>>>>>>>>>>>>>> '--char_spacing=1.0', >>>>>>>>>>>>>>> '--exposure=0', >>>>>>>>>>>>>>> '--unicharset_file=langdata/ben.unicharset', >>>>>>>>>>>>>>> ]) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> line_count += 1 >>>>>>>>>>>>>>> font_serial += 1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # Reset font_serial for the next font iteration >>>>>>>>>>>>>>> font_serial = 1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> write_line_count(line_count) # Update the line_count >>>>>>>>>>>>>>> in the file >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if __name__ == "__main__": >>>>>>>>>>>>>>> parser = argparse.ArgumentParser() >>>>>>>>>>>>>>> parser.add_argument('--start', type=int, help='Starting >>>>>>>>>>>>>>> line count (inclusive)') >>>>>>>>>>>>>>> parser.add_argument('--end', type=int, help='Ending >>>>>>>>>>>>>>> line count (inclusive)') >>>>>>>>>>>>>>> args = parser.parse_args() >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> training_text_file = 'langdata/ben.training_text' >>>>>>>>>>>>>>> output_directory = 'tesstrain/data/ben-ground-truth' >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # Create an instance of the FontList class >>>>>>>>>>>>>>> font_list = FontList() >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> create_training_data(training_text_file, font_list, >>>>>>>>>>>>>>> output_directory, args.start, args.end) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *and for training code:* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> import subprocess >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # List of font names >>>>>>>>>>>>>>> font_names = ['ben'] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> for font in font_names: >>>>>>>>>>>>>>> command = f"TESSDATA_PREFIX=../tesseract/tessdata make >>>>>>>>>>>>>>> training MODEL_NAME={font} START_MODEL=ben >>>>>>>>>>>>>>> TESSDATA=../tesseract/tessdata MAX_ITERATIONS=10000 >>>>>>>>>>>>>>> LANG_TYPE=Indic" >>>>>>>>>>>>>>> subprocess.run(command, shell=True) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> any suggestion to identify to extract the problem. >>>>>>>>>>>>>>> thanks, everyone >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>> from it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/406cd733-b265-4118-a7ca-de75871cac39n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>> . >>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4893f0d4-c580-4dc4-b5b7-2bb99ee14540n%40googlegroups.com.