Hi Misti, Thanks for the info. Will have a look at that. Yes getting a good picture as a blind person isn't all that easy. Which output format might be the best to preserve the most formatting, headings and other things? hocr?
Greetings, Simon Von: tesseract-ocr@googlegroups.com <tesseract-ocr@googlegroups.com> Im Auftrag von Misti Hamon Gesendet: Dienstag, 30. April 2024 20:44 An: tesseract-ocr@googlegroups.com Betreff: Re: [tesseract-ocr] Using Tesseract as an OCR solution for blind people Image quality matters. Upside down or sideways images really need to be rotated first - that is easy to do without loading up an image editor, just need to get into the jpg's metadata. It sounds like you are processing text books, to turn into something a screenreader can manage? Headers and such get recognized (you'll probably have to post-process the tesseract results, screen readers like hierarchical formats, will have to look at the formats tesseract provides and see if there is one that can be fed directly to a screen reader). Charts and tables, especially if they have a background color or row or column stripping it has problems with. If the images you are working with aren't evenly lit, or they are low DPI there will be problems too. (Personal experience here, been processing textbook type format books myself) On Tue, Apr 30, 2024, 11:59 Eigeldinger Simon <simon.eigeldin...@hohenems.at<mailto:simon.eigeldin...@hohenems.at>> wrote: Hi all, I just want to update the info i have about tesseract. I would need an OCR program that can recognize text in scanned documents. Those are in jpg or multipage pdf format. Pages may be up side down. They also might contain images, tables and headings. Can i recognize those pages out of the box with tesseract? Can tesseract also recognize tables and headings? A few years ago someone would need to process the images first. Is this still the status? Greetings, Simon -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a6d49be8cccb40c287d480a9e0053807%40hohenems.at<https://groups.google.com/d/msgid/tesseract-ocr/a6d49be8cccb40c287d480a9e0053807%40hohenems.at?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAEnOb6S_Xrz%3D8LY_Gf8BbAdVoJZAqPR09tO6PpnKW-5C-Y%2Bt4g%40mail.gmail.com<https://groups.google.com/d/msgid/tesseract-ocr/CAEnOb6S_Xrz%3D8LY_Gf8BbAdVoJZAqPR09tO6PpnKW-5C-Y%2Bt4g%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a29cd139d54e482fa75b2f481caec546%40hohenems.at.