First of all: it is a good manner to provide a test case (working code + input &output) Next: there were improvements (e.g. https://github.com/tesseract-ocr/tesseract/commit/3a5e5089343798932d9952628acfdf56f3108c43) in providing better -bounding boxes, so you will need to make a custom build with reverting of respective commits.
Zdenko po 27. 2. 2023 o 8:03 Prashant Sharma <prashantsharma1...@gmail.com> napĂsal(a): > Hi All, > > I am trying to upgrade the software versions of an inhouse text extraction > application developed with Python, tesserocr python module and tesseract > OCR software as below: > > > > - Existing software versions (Outdated softwares) : Python (v3.6.5) + > tesserocr (v2.4.0) + tesseract OCR (v4) > - Target software versions (Latest softwares) : Python (v3.10.7) + > tesserocr (v2.5.2) + tesseract OCR (v5) > > > However I get different results from same set of softwares with different > versions (as above) in terms of bounding box cordinates, text extraction > results (minor changes), and other numerical metadata while calling the > GetHOCRText method. > > I need to get exact same extraction result in terms of metadata > (ex.-bounding boxes) as I have some dependencies post the text extraction > hence result needs to be same for metadata with the upgraded softwares. > > Could you please advise ? > > Regards, > Prashant Sharma > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/59de7622-bb9d-4aa2-8b86-686b3d63f639n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/59de7622-bb9d-4aa2-8b86-686b3d63f639n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wO3C6q5LAjc3y9pOwnivnjnvSP8AvenYYwefNLkJ8MpQ%40mail.gmail.com.