Re: [tesseract-ocr] Text extraction

2024-10-12 Thread Zdenko Podobny
Hello, tesseract is the OCR *engine*, which can handle images with simple layouts like book pages. For images with complex layouts (e.g. tables, a lot of graphics), you need to combine it with other tools for preprocessing (identifying text areas, removing graphics) and postprocessing (layout rec

[tesseract-ocr] Text extraction

2024-10-11 Thread Omar Sherif
I want to extract the text in the attached image preserving the structure, but I didn't find something about that in documentation. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from i

[tesseract-ocr] Text extraction failure after preprocessing.

2024-06-27 Thread 'uday kaipa' via tesseract-ocr
Hi, I have an image having number 96 in it.(that might contains a number between 0 and 100.) PFA. I have used tesseract PSM from 6 to 13 and image size and font and everything looks good to me. Text is recognized as 36. When i try to adjust padding or other pre-processing, it would work for th

[tesseract-ocr] Text Extraction from Table

2018-11-17 Thread Soumen Seth
Hi Everyone, I am working on: Python 2.7 Pytesseract Tesseract version - tesseract 4.0.0-beta.1 leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 I am trying to extract texts from table, but I've

[tesseract-ocr] Text Extraction from complex Table

2018-11-17 Thread Soumen Seth
Hi Everyone, I am working on *python 2.7* and *pytesseract*. My tasseract version - tesseract 4.0.0-beta.1 leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 I am trying to extract text from a tab

Re: [tesseract-ocr] text extraction problem with tesseract for the image

2016-11-24 Thread Allistair
By figure text, so you mean "Figure 1: figure supplement 1 Vera et al."? If so I would do a two-pass approach of cropping out the clearly separated top right figure text, then resizing it to Tesseract-friendly resolution, then OCR it. It worked for me (MacOS, ImageMagick, Tesseract 3.04.01) ...