I generally do image resizing to help me to correct errors like this.
For ex, for your test1.bmp, I did:
*convert test1.bmp -resize 400% testnew.bmp*
I used imagemagick to resize the image. After this, tesseract identified
':' correctly.
Though sometimes, image resizing introduces some other
I do not have a technical reason for you but I confirm that Tesseract is
sensitive to padding around words you are trying to detect (perhaps
something about its page segmentation). Best to make sure text has enough
white space around it in my experience.
On 6 June 2016 at 18:23, 'Carlo' via tesser
I had same problem for Swedish language and a temporary workaround helped
me. I zoomed (re-scaled) image to 400% and it recognized the letter.
(Though it added other problems). Not sure, but it could improve results
for you.
Ashish
On Mon, Jun 6, 2016 at 8:53 PM, Tom Morris wrote:
> On Monday,
I am trying to process a png image. Will it work, if I convert my png to
tiff before OCRing?
On Mon, Jun 6, 2016 at 5:28 PM, Zdenko Podobný wrote:
> Your leptonica build support only limited number of image formats. What
> image you try to process?
>
> Zdenko
>
> On Mon, Jun 6, 2016 at 1:08 PM,
On Monday, June 6, 2016 at 3:17:29 AM UTC-4, Doron Saar wrote:
>
>
> I'm trying to train Tesseract to work with a large library of Hebrew
> language documents.
>
Why? Did you get unacceptable results with the standard Hebrew language
data?
https://github.com/tesseract-ocr/tessdata/blob/master/h
Your leptonica build support only limited number of image formats. What
image you try to process?
Zdenko
On Mon, Jun 6, 2016 at 1:08 PM, Ashish Goel wrote:
> Hello All,
>
> I am trying to do OCR on a bunch of images. Getting some failures, and I
> want to analyse them.
> So, to do that, I am tr
Hello All,
I am trying to do OCR on a bunch of images. Getting some failures, and I
want to analyse them.
So, to do that, I am trying to get the tessinput.tif file so that I can
find out what input actually goes to tesseract.
I am passing "-c tessedit_write_images 1" along with my tesseract to
I just get the same mistakes all the time.
The letter ו is often read as ט
The letter נ is often read as )
and so on.
When I add more training data files I just get worse results instead of
better results.
On Monday, June 6, 2016 at 1:51:45 PM UTC+3, Ashish Goel wrote:
>
> If you can elaborat
If you can elaborate on what kind of failures you are experiencing, people
might be able to help.
On Monday, June 6, 2016 at 12:47:29 PM UTC+5:30, Doron Saar wrote:
>
> Hi,
>
> I'm trying to train Tesseract to work with a large library of Hebrew
> language documents.
> They are all in good qual
Hi,
I'm trying to train Tesseract to work with a large library of Hebrew
language documents.
They are all in good quality scanning, black and white, and most of them
have the same font and character size.
The hebrew alphabet should be relatively very simple for OCR: 27
characters, no Upper/Low
10 matches
Mail list logo