First of all: it is a good manner to provide a test case (working code +
input &output)
Next: there were improvements (e.g.
https://github.com/tesseract-ocr/tesseract/commit/3a5e5089343798932d9952628acfdf56f3108c43)
in providing better -bounding boxes, so you will need to make a custom
build with reverting of respective commits.

Zdenko


po 27. 2. 2023 o 8:03 Prashant Sharma <prashantsharma1...@gmail.com>
napĂ­sal(a):

> Hi All,
>
> I am trying to upgrade the software versions of an inhouse text extraction
> application developed with Python, tesserocr python module and tesseract
> OCR software as below:
>
>
>
>    - Existing software versions (Outdated softwares) : Python (v3.6.5) +
>    tesserocr (v2.4.0) + tesseract OCR (v4)
>    - Target software versions   (Latest softwares)   : Python (v3.10.7) +
>    tesserocr (v2.5.2) + tesseract OCR (v5)
>
>
> However I get different results from same set of softwares with different
> versions (as above) in terms of bounding box cordinates, text extraction
> results (minor changes), and other numerical metadata while calling the
> GetHOCRText method.
>
> I need to get exact same extraction result in terms of metadata
> (ex.-bounding boxes) as I have some dependencies post the text extraction
> hence result needs to be same for metadata with the upgraded softwares.
>
> Could you please advise ?
>
> Regards,
> Prashant Sharma
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/59de7622-bb9d-4aa2-8b86-686b3d63f639n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/59de7622-bb9d-4aa2-8b86-686b3d63f639n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wO3C6q5LAjc3y9pOwnivnjnvSP8AvenYYwefNLkJ8MpQ%40mail.gmail.com.

Reply via email to