Hi zdenop, thanks for your tip, but I'm using the ProcessPage*s* function, so it should write the head and footer part of the file itself. BUT I've played a bit with ProcessPage() + BeginDocument() before and EndDocument() after and the resulting file has big differences. Sadly, the file is still corrupt.
So it seems the problem is based on the failing begin/enddocument function. But even there I'm experiencing mysterious bugs. Using only EndDocument(), I have something like a footer at the end of the file: [image: r3mxpijfjkxk073pmzquqm343_testjpg.pdf_ff — gosseract [SSH: root.debdocker.home.blaumedia.com]_2021-11-21 23-07-40_MacPro.png] But it suddenly stops at "Produce". But when I'm using BeginDocument(), ProcessPage() and then EndDocument() the file is ending with bytes and there is no "endstream" or "endobj". I've updated to latest 4.1.3 version but problem still exists. I updated the bug branch in https://github.com/dnnspaul/gosseract/tree/tesseract/bug%2F3652 so the problem is reproducible. To disable the BeginDocument, one have to comment out https://github.com/dnnspaul/gosseract/blob/tesseract/bug/3652/tessbridge.cpp#L187. I tried to use 1:1 the code from the tesseract cli but it still does not work... zdenop schrieb am Sonntag, 21. November 2021 um 13:18:52 UTC+1: > seems like the same problem as > https://github.com/sirfz/tesserocr/issues/271#issuecomment-919334885 > > Did you use BeginDocument EndDocument ? > > Zdenko > > > ne 21. 11. 2021 o 9:27 'blaumedia' via tesseract-ocr < > tesser...@googlegroups.com> napísal(a): > >> Described already in issue: >> https://github.com/tesseract-ocr/tesseract/issues/3652 >> >> I'm trying to generate a searchable PDF outgoing from a jpg image, but >> the file that gets output is an invalid pdf file that can't be read by any >> pdf reader. >> >> I have added an docker image for reproduction of the problem in the >> issue, but here is the bash snippet for it: >> >> *git clone g...@github.com:dnnspaul/gosseract.git* >> *git checkout tesseract/bug/3652* >> >> *docker build -t tessbug .* >> *docker run -it -v $PWD/tmp:/tmp tessbug go run main.go* >> >> When I'm inputting the file in the tesseract cli, the outcoming pdf is >> readable, but I can't find any difference between the cli and my snippet. >> >> Thanks in advance for any help! I'm very sorry, I'm more a GoLang >> developer, than a C ++ developer so I have kind of problems with the >> simplest syntax, but tried my best. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/f34562d3-d11e-4385-9c78-b24092413dean%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/f34562d3-d11e-4385-9c78-b24092413dean%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ad68ab2c-2d45-47c3-9194-5d1cd8ea8400n%40googlegroups.com.