[tesseract-ocr] Tesseract confused between a character and a digit which look-alike

2022-06-07 Thread 'Yash Mistry' via tesseract-ocr
I am facing challenge to extract correct a letter from a word which are look-alike, i.e 5 & S, I & 1, 8 & S. I applied image pre-processing techniques like Blurring, erode, dilate, normalised the noise, remove unnecessary component and text detection from the input image but after these much

Re: [tesseract-ocr] Tesseract confused between a character and a digit which look-alike

2022-06-07 Thread Lorenzo Bolzani
Hi Yash, in my experience you are going top see a lot of these errors on similar characters. Given the pre processed text only I might do the same mistake myself. What I do is to fix these letters according to a pattern, in this case WDDD and I replace: S <-> 8 O <-> 0 I <-> 1 i <-> 1

[tesseract-ocr] Tesseract .uzn zone file

2022-06-07 Thread Simas Skubutis
Environment - *Tesseract Version*: 5.1.0 - *Platform*: Windows 10 64bit *Problem with .uzn file* After working with tesseract 4.1.0 everything worked perfectly. I used command tesseract inputPhotoName.png outputName -l eng --oem 1 --psm 4 hocr and tesseract automatically picked up inp

Re: [tesseract-ocr] "Error in selectDefaultPdfEncoding: type selection failure" on Tesseract 5.1.0 in Ubuntu

2022-06-07 Thread Lucas L.
Sure, I will write that up. Thanks for helping, zdenop. Would you happen to know which is the most recent version that does not exhibit this issue so I can switch to that? On Tuesday, June 7, 2022 at 12:27:08 AM UTC-5 zdenop wrote: > Can you please create an issue at > https://github.com/tesse

Re: [tesseract-ocr] "Error in selectDefaultPdfEncoding: type selection failure" on Tesseract 5.1.0 in Ubuntu

2022-06-07 Thread Lucas L.
Also, I feel compelled to mention that I think I have seen this on some of my unupdated VMs running 4.1.1, also built from source, on the same document. Sorry for the spam, I wish I could edit. I think it may be tied to leptonica specifically or something else in the environment? The same versi

[tesseract-ocr] Tesseract Offline Blazor Error.

2022-06-07 Thread Leon Komendant
Hello, i'm trying to get an OCR function into my blazor website. Therefore i have a js-File(ocr.js) that is creating a Worker that should recognize the image. The paths are all correct. My Website is using https with a selfsigned certificate. Like this, everything works as long as i'm online, b