date:20240222

Re: [tesseract-ocr] Re: tesseract training flags to rtl languages

2024-02-22 Thread Ger Hobbelt

On Thu, 22 Feb 2024, 07:32 Dror Musai, wrote: > Hi > > using version 5.3 of tesseract with hebrew lang. still not understand > why adobe + foxit , can not find word in the pdf after ocr. > Pdf does not equal "text"! Pdf is a *complex* format where, more often than not, human-visible "t

Re: [tesseract-ocr] generic meme extraction?

2024-02-22 Thread Glenn Cochran

Hi experts,I’ve read that tesseract is not good at image OCR, for images like internet photos, but does well on pdf text. Is this true, or I need to build some complex training to guide it?Sent from my iPhoneOn Feb 14, 2024, at 12:28, Glenn C wrote:Hi all,I'm trying to build a meme text extractio

Re: [tesseract-ocr] Re: tesseract training flags to rtl languages

2024-02-22 Thread Tom Morris

I only skimmed Ger's long reply, but didn't see a link to the issue, which I think is the important bit of information: https://github.com/tesseract-ocr/tesseract/issues/238 It's a long standing (and complex) problem in which behavior varies across different PDF viewers. Tom -- You received

[tesseract-ocr] Help recognizing text from image

2024-02-22 Thread Will Fetherolf

All, I need some help extracting the text from this image. I'm using the command line version of Tesseract from UBMannheim. I think it's 5.2 installed. I've tried every PSM, and nothing seems to pull it out. If I crop off the minus sign, it works perfectly. Any tips at all would be appreci