Hi Misti,

Thanks for the info.
Will have a look at that.
Yes getting a good picture as a blind person isn't all that easy.
Which output format might be the best to preserve the most formatting, headings 
and other things? hocr?

Greetings,
Simon
Von: tesseract-ocr@googlegroups.com <tesseract-ocr@googlegroups.com> Im Auftrag 
von Misti Hamon
Gesendet: Dienstag, 30. April 2024 20:44
An: tesseract-ocr@googlegroups.com
Betreff: Re: [tesseract-ocr] Using Tesseract as an OCR solution for blind people

Image quality matters. Upside down or sideways images really need to be rotated 
first - that is easy to do without loading up an image editor, just need to get 
into the jpg's metadata.

It sounds like you are processing text books, to turn into something a 
screenreader can manage? Headers and such get recognized (you'll probably have 
to post-process the tesseract results, screen readers like hierarchical 
formats, will have to look at the formats tesseract provides and see if there 
is one that can be fed directly to a screen reader). Charts and tables, 
especially if they have a background color or row or column stripping it has 
problems with. If the images you are working with aren't evenly lit, or they 
are low DPI there will be problems too. (Personal experience here, been 
processing textbook type format books myself)

On Tue, Apr 30, 2024, 11:59 Eigeldinger Simon 
<simon.eigeldin...@hohenems.at<mailto:simon.eigeldin...@hohenems.at>> wrote:
Hi all,

I just want to update the info i have about tesseract.
I would need an OCR program that can recognize text in scanned documents.
Those are in jpg or multipage pdf format.
Pages may be up side down.
They also might contain images, tables and headings.
Can i recognize those pages out of the box with tesseract?
Can tesseract also recognize tables and headings?
A few years ago someone would need to process the images first.
Is this still the status?

Greetings,
Simon
--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a6d49be8cccb40c287d480a9e0053807%40hohenems.at<https://groups.google.com/d/msgid/tesseract-ocr/a6d49be8cccb40c287d480a9e0053807%40hohenems.at?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAEnOb6S_Xrz%3D8LY_Gf8BbAdVoJZAqPR09tO6PpnKW-5C-Y%2Bt4g%40mail.gmail.com<https://groups.google.com/d/msgid/tesseract-ocr/CAEnOb6S_Xrz%3D8LY_Gf8BbAdVoJZAqPR09tO6PpnKW-5C-Y%2Bt4g%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a29cd139d54e482fa75b2f481caec546%40hohenems.at.

Reply via email to