Hi, have a look at this example: article: https://iamrajatroy.medium.com/document-intelligence-series-part-2-transformer-for-table-detection-extraction-80a52486fa3 notebook: https://nbviewer.org/github/iamrajatroy/Data-Science-Lab/blob/main/notebook/DETR_Document_Intelligence.ipynb
Zdenko so 21. 12. 2024 o 6:56 Riccardo <riccardo.degioan...@gmail.com> napĂsal(a): > Hello, > I am trying to use Tesseract to create a small Windows application that > allows the user to: > > - Take a screenshot of the monitor and cut a smaller portion > containing a table (the table always has the same format, and the labels > are consistent. The numerical data are different each time). > - Provide the screenshot to Tesseract to extract the data. My strategy > is to remove the vertical and horizontal lines in the table, extract the > entire text, and collect the numerical values corresponding to the labels I > want to capture. > - Finally, generate a text output based on the extracted data. > > The app works fine, but there are still many errors in data extraction. > Sometimes, some values are not extracted at all because the label is not > correctly recognized. Other times, even if the labels are recognized > correctly and the data are extracted, the numbers are incorrect. Also I > noticed that the error quote is higher on my work PC, probably because the > screen resolution is lower than my home PC. > > I am wondering if there is a more reliable way to accomplish my goal. > > Below I attached some images of the App to give you an idea, an example of > the table and the python script I am using for OCR. > > Thank you very much for your help!!! > > tesseract v5.4.0.20240606 > > Python 3.13.1 > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion visit > https://groups.google.com/d/msgid/tesseract-ocr/191d869f-9ff0-4297-b539-aad42fc3c1e3n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/191d869f-9ff0-4297-b539-aad42fc3c1e3n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wJFGFAP3WRh_x7X3hE-vnVUp7QqE6Od9OJRP7_6brBOQ%40mail.gmail.com.