Take look at the attached file. How can I implement ALTO in it and what will the command look like in teminal?
Tommy torsdag 8. august 2019 12.04.23 UTC+2 skrev Tommy Klausen følgende: > > Ok. > > Because if a config file for alto exists (which didn`t for some reason in > the install) I can just write the command with "alto" in the end, right? > > Can you give me the two different commands for reading an image (with and > without the confg file)? > > torsdag 8. august 2019 11.51.27 UTC+2 skrev shree følgende: >> >> >> https://github.com/tesseract-ocr/tesseract/blob/master/tessdata/configs/alto >> >> >> You can use `alto` config file or use the config variable as part of >> command >> >> -c tessedit_create_alto=1 >> >> On Thu, Aug 8, 2019 at 2:59 PM Tommy Klausen <klaus...@gmail.com> wrote: >> >>> Hi. >>> >>> Is the ALTO config option supported in the last linux version of >>> Tesseract? >>> I have managed to use the HOCR but not ALTO. >>> Is it something I need to do with the config files? >>> >>> Tommy >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesser...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/de13eba7-8b6f-47bc-b1a7-981bc87e1ed5%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/de13eba7-8b6f-47bc-b1a7-981bc87e1ed5%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f48e01c1-7272-45bf-8c52-47858f6dcc10%40googlegroups.com.
# import the necessary packages from PIL import Image import pytesseract import argparse import cv2 import os # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to input image to be OCR'd") ap.add_argument("-p", "--preprocess", type=str, default="thresh", help="type of preprocessing to be done") args = vars(ap.parse_args()) # load the example image and convert it to grayscale image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # check to see if we should apply thresholding to preprocess the # image if args["preprocess"] == "thresh": gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1] # make a check to see if median blurring should be done to remove # noise elif args["preprocess"] == "blur": gray = cv2.medianBlur(gray, 3) # write the grayscale image to disk as a temporary file so we can # apply OCR to it filename = "{}.png".format(os.getpid()) cv2.imwrite(filename, gray) # load the image as a PIL/Pillow image, apply OCR, and then delete # the temporary file text = pytesseract.image_to_string(Image.open(filename)) os.remove(filename) print(text) # show the output images cv2.imshow("Image", image) cv2.imshow("Output", gray) cv2.waitKey(0)