Hi,

have a look at this example:
article:
https://iamrajatroy.medium.com/document-intelligence-series-part-2-transformer-for-table-detection-extraction-80a52486fa3
notebook:
https://nbviewer.org/github/iamrajatroy/Data-Science-Lab/blob/main/notebook/DETR_Document_Intelligence.ipynb

Zdenko


so 21. 12. 2024 o 6:56 Riccardo <riccardo.degioan...@gmail.com> napĂ­sal(a):

> Hello,
> I am trying to use Tesseract to create a small Windows application that
> allows the user to:
>
>    - Take a screenshot of the monitor and cut a smaller portion
>    containing a table (the table always has the same format, and the labels
>    are consistent. The numerical data are different each time).
>    - Provide the screenshot to Tesseract to extract the data. My strategy
>    is to remove the vertical and horizontal lines in the table, extract the
>    entire text, and collect the numerical values corresponding to the labels I
>    want to capture.
>    - Finally, generate a text output based on the extracted data.
>
> The app works fine, but there are still many errors in data extraction.
> Sometimes, some values are not extracted at all because the label is not
> correctly recognized. Other times, even if the labels are recognized
> correctly and the data are extracted, the numbers are incorrect. Also I
> noticed that the error quote is higher on my work PC, probably because the
> screen resolution is lower than my home PC.
>
> I am wondering if there is a more reliable way to accomplish my goal.
>
> Below I attached some images of the App to give you an idea, an example of
> the table and the python script I am using for OCR.
>
> Thank you very much for your help!!!
>
> tesseract v5.4.0.20240606
>
> Python 3.13.1
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/tesseract-ocr/191d869f-9ff0-4297-b539-aad42fc3c1e3n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/191d869f-9ff0-4297-b539-aad42fc3c1e3n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wJFGFAP3WRh_x7X3hE-vnVUp7QqE6Od9OJRP7_6brBOQ%40mail.gmail.com.

Reply via email to