Hi This happens normally. The one properly working in local machine may not work as expected when dockerise. Please check following
1.Tesseract version you build in Docker file is same as in local machine. 2.If the input is pdf, try changing the version of pdf2image library in requirement file. 3.Enable debugging, build docker image in local machine, open the container and copy the debugging images to local machine. Compare these with the debug images which are taken from local code run. 4.Now you can identify which stage is having issues and start working on that. 5.Most likely some preprocessing corrections will give proper results in docker All the best On Fri, 16 Sep, 2022, 12:06 pm Gabriel Sousa, <gabr...@gsousa.com.br> wrote: > Hi there, > > I'm new to this group, and also new to using Tesseract in general. > > We use py-tesseract for a few data extraction, not many cases, at the > company I work for and for no apparent reason, tesseract text extraction > stopped working from one deploy to another. > > It should extract a word such as 'JANUARY' - and it used to do this just > fine, but now it reads 'J ANUARY'. The service is running on a docker > container. So everything is using the same version from the last time all > the tests passed, and now it just breaks when testing something that was > not altered at all. > > Not sure if this is even possible, to me this just seems impossible, but > I've went as far as I can in terms of checking code and dependencies and > nothing fixes the problem. Outside the container, everything works just > fine, tesseract only returns a wrong value INSIDE the container... > > Anyways... any thoughts regarding how to look for a fix or any ideias > overall are welcomed. > > Thank you > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/90c76920-8375-4c28-9d99-8f8f8249a14fn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/90c76920-8375-4c28-9d99-8f8f8249a14fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAGy9PEOcCYdX2vKsNOt4p3z2sMXYxPYdxVxow7eHp8UMjeBRdQ%40mail.gmail.com.