Re: [tesseract-ocr] Support for alto - option in Tesseract for linux

Tommy Klausen Thu, 08 Aug 2019 03:53:13 -0700

Take look at the attached file.

How can I implement ALTO in it and what will the command look like in 
teminal?


Tommy

torsdag 8. august 2019 12.04.23 UTC+2 skrev Tommy Klausen følgende:
>
> Ok.
>
> Because if a config file for alto exists (which didn`t for some reason in 
> the install) I can just write the command with "alto" in the end, right?
>
> Can you give me the two different commands for reading an image (with and 
> without the confg file)?
>
> torsdag 8. august 2019 11.51.27 UTC+2 skrev shree følgende:
>>
>>
>> https://github.com/tesseract-ocr/tesseract/blob/master/tessdata/configs/alto
>>  
>>
>> You can use `alto` config file or use the config variable as part of 
>> command
>>
>> -c tessedit_create_alto=1 
>>
>> On Thu, Aug 8, 2019 at 2:59 PM Tommy Klausen <klaus...@gmail.com> wrote:
>>
>>> Hi.
>>>
>>> Is the ALTO config option supported in the last linux version of 
>>> Tesseract?
>>> I have managed to use the HOCR but not ALTO.
>>> Is it something I need to do with the config files?
>>>
>>> Tommy
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesser...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/de13eba7-8b6f-47bc-b1a7-981bc87e1ed5%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/de13eba7-8b6f-47bc-b1a7-981bc87e1ed5%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> -- 
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f48e01c1-7272-45bf-8c52-47858f6dcc10%40googlegroups.com.

# import the necessary packages
from PIL import Image
import pytesseract
import argparse
import cv2
import os
 
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to input image to be OCR'd")
ap.add_argument("-p", "--preprocess", type=str, default="thresh",
	help="type of preprocessing to be done")
args = vars(ap.parse_args())

# load the example image and convert it to grayscale
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
 
# check to see if we should apply thresholding to preprocess the
# image
if args["preprocess"] == "thresh":
	gray = cv2.threshold(gray, 0, 255,
		cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
 
# make a check to see if median blurring should be done to remove
# noise
elif args["preprocess"] == "blur":
	gray = cv2.medianBlur(gray, 3)
 
# write the grayscale image to disk as a temporary file so we can
# apply OCR to it
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, gray)

# load the image as a PIL/Pillow image, apply OCR, and then delete
# the temporary file
text = pytesseract.image_to_string(Image.open(filename))
os.remove(filename)
print(text)
 
# show the output images
cv2.imshow("Image", image)
cv2.imshow("Output", gray)
cv2.waitKey(0)

Re: [tesseract-ocr] Support for alto - option in Tesseract for linux

Reply via email to