[tesseract-ocr] I have a problem with the current tesseract

2019-05-07 Thread Pkumar ..
I have a problem with the current tesseract I use tesseract in PHP coding like under Image Upload Success"; echo ''; shell_exec('"C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe" "D:\\xampp\\htdocs\\Image_OCR\\Image1\\'.$file_name.'" out'); echo "OCR after reading"; $myfile = fo

[tesseract-ocr] How to extract text for processing by tesseract v4?

2019-05-07 Thread Jason
I have a problem with the current tesseract. I have documents that have sections of varying background and text colors. Ive read that tesseract v3 was white/black invariant and it didn't matter if I had white text on red background. But now it matters. The problem is, other parts in the same im

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Sean Connell
Yeah my bad seems that just removing the resize code has made it able to actually detect the 1 as well thanks for all the help. On Tuesday, May 7, 2019 at 4:04:19 PM UTC-4, zdenop wrote: > > Sorry I misread the python message: actually tesseract did not find there > anything (which at the end i

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Zdenko Podobny
Sorry I misread the python message: actually tesseract did not find there anything (which at the end is the same as there is l ;-) ) First of all I would stop with that idiotic resizing. For tesseract is AFAIR is best to have letter at size 13-30 px. Your as 240! Try to provide original captured i

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Sean Connell
I see so should I fool around with the contrast or is there a way to make it so it while only use number selection when reading the image? On Tuesday, May 7, 2019 at 3:26:50 PM UTC-4, zdenop wrote: > > probably because it is recognized as "l" instead of 1 and you can not > convert letter to inte

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Zdenko Podobny
probably because it is recognized as "l" instead of 1 and you can not convert letter to integer. Zdenko ut 7. 5. 2019 o 21:21 Sean Connell napísal(a): > Thanks a bunch it works now. The only issue it has is when trying to > detect the number 1 for some reason it just thinks nothing is there. >

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Lorenzo Bolzani
This is where you need to improve contrast. https://pillow.readthedocs.io/en/stable/reference/ImageEnhance.html You need to play a little with PIL to find out what works best for your data. Lorenzo Il giorno mar 7 mag 2019 alle ore 21:21 Sean Connell < nightfire120sla...@gmail.com> ha scritto:

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Sean Connell
Thanks a bunch it works now. The only issue it has is when trying to detect the number 1 for some reason it just thinks nothing is there. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Zdenko Podobny
you need to add space at the end of tessdata_dir_config because later you add to it another string with configurations: tessdata_dir_config = r'--tessdata-dir "S:\Tesseract\tessdata" ' Zdenko ut 7. 5. 2019 o 19:57 Sean Connell napísal(a): > So I added this line of code tessdata_dir_config =

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Sean Connell
So I added this line of code tessdata_dir_config = r'--tessdata-dir "S:\Tesseract\tessdata"' and downloaded the English repository from the link you provided but now I get an error (see attached picture). On Tuesday, May 7, 2019 at 12:41:33 PM UTC-4, zdenop wrote: > > This change was sufficie

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Zdenko Podobny
This change was sufficient. BTW: I use data from best repository[1] [1] https://github.com/tesseract-ocr/tessdata_best Zdenko ut 7. 5. 2019 o 18:01 Sean Connell napísal(a): > Thank you I'll give that a go and see if it works any better. Is it worth > trying to increase the contrast as well?

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Sean Connell
Thank you I'll give that a go and see if it works any better. Is it worth trying to increase the contrast as well? On Tuesday, May 7, 2019 at 9:52:21 AM UTC-4, zdenop wrote: > > modify last part of your code to this: > > # invert image and convert to grayscale > inverted = PIL.ImageOps.invert(new

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Zdenko Podobny
modify last part of your code to this: # invert image and convert to grayscale inverted = PIL.ImageOps.invert(newim2).convert('LA') loopTest = (pytesseract.image_to_string( inverted, config=tessdata_dir_config + '--psm 8 --oem 3')) print(loopTest) loopTest = int(loopTest) Do not forget to imp

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Sean Connell
Thanks a bunch for the response. How would I go about inverting the image and increasing the contrast though. Sorry I'm still learning how all this works. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop

Re: [tesseract-ocr] Extract font size,style,colour from an image

2019-05-07 Thread VANAM VISHAL
I am using tesseract 3.05.00 stable version alongside tesserocr and couldn't use WordFontAttributes to check whether a word is bold, font-size etc.. But, I can find the text detection but not size and is bold? On Friday, May 19, 2017 at 10:39:59 PM UTC+5:30, zdenop wrote: > > tesseract 3.05 (the

Re: [tesseract-ocr] OCR Failing to Consistenly Recongnize the single digit in my screenshot

2019-05-07 Thread Lorenzo Bolzani
Hi, try to invert the images (black text on white) and use psm 6 or 7. Increasing contrast may also help. Lorenzo Il mar 7 mag 2019, 08:49 Sean Connell ha scritto: > Currently my program searches for the picture of the word Opponents on the > screen then moves a bit a takes a picture of the