I do not understand your question: how it is related to the discussed topic?
Zdenko po 22. 11. 2021 o 14:34 Sarah Jane CHANNEL <kangchitan2...@gmail.com> napísal(a): > this code can read text? > > On Mon, 22 Nov 2021, 21:28 Zdenko Podobny, <zde...@gmail.com> wrote: > >> Here is a simple code, that works for me (with tesseract 5 and leptonica >> 1.82) >> >> #include <leptonica/allheaders.h> >> #include <tesseract/baseapi.h> >> #include <tesseract/renderer.h> >> #include <string> >> >> int main() { >> const char* datapath = "f:/Project-Personal/tessdata_best/tessdata"; >> std::string language_ = "eng"; >> std::string inputFile_ = "input.png"; >> const char* outputbase = "output"; >> >> tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI(); >> if (api->Init(datapath, language_.c_str(), tesseract::OEM_LSTM_ONLY)) >> { >> fprintf(stderr, "Could not initialize tesseract.\n"); >> exit(1); >> } >> >> PIX *sourceImg = pixRead(inputFile_.c_str()); >> if (!sourceImg) { >> fprintf(stderr, "Leptonica can't process input file: %s\n", >> inputFile_.c_str()); >> return EXIT_FAILURE; >> } >> api->SetImage(sourceImg); >> api->SetInputName(inputFile_.c_str()); >> api->SetOutputName(outputbase); >> >> tesseract::TessPDFRenderer* renderer = >> new tesseract::TessPDFRenderer(outputbase, api->GetDatapath()); >> if (!renderer->happy()) { >> printf("Error, could not create PDF output file: %s\n", >> strerror(errno)); >> delete renderer; >> } >> >> bool succeed = api->ProcessPages(inputFile_.c_str(), nullptr, 0, >> renderer); >> if (!succeed) { >> fprintf(stderr, "Error during processing.\n"); >> return EXIT_FAILURE; >> } >> >> api->End(); >> pixDestroy(&sourceImg); >> return 0; >> } >> >> >> Zdenko >> >> >> ne 21. 11. 2021 o 23:16 'blaumedia' via tesseract-ocr < >> tesseract-ocr@googlegroups.com> napísal(a): >> >>> Hi zdenop, >>> >>> thanks for your tip, but I'm using the ProcessPage*s* function, so it >>> should write the head and footer part of the file itself. >>> BUT I've played a bit with ProcessPage() + BeginDocument() before and >>> EndDocument() after and the resulting file has big differences. Sadly, the >>> file is still corrupt. >>> >>> So it seems the problem is based on the failing begin/enddocument >>> function. But even there I'm experiencing mysterious bugs. >>> Using only EndDocument(), I have something like a footer at the end of >>> the file: >>> [image: r3mxpijfjkxk073pmzquqm343_testjpg.pdf_ff — gosseract [SSH: >>> root.debdocker.home.blaumedia.com]_2021-11-21 23-07-40_MacPro.png] >>> >>> But it suddenly stops at "Produce". But when I'm using BeginDocument(), >>> ProcessPage() and then EndDocument() the file is ending with bytes and >>> there is no "endstream" or "endobj". >>> I've updated to latest 4.1.3 version but problem still exists. >>> >>> I updated the bug branch in >>> https://github.com/dnnspaul/gosseract/tree/tesseract/bug%2F3652 so the >>> problem is reproducible. >>> To disable the BeginDocument, one have to comment out >>> https://github.com/dnnspaul/gosseract/blob/tesseract/bug/3652/tessbridge.cpp#L187 >>> . >>> >>> I tried to use 1:1 the code from the tesseract cli but it still does not >>> work... >>> >>> zdenop schrieb am Sonntag, 21. November 2021 um 13:18:52 UTC+1: >>> >>>> seems like the same problem as >>>> https://github.com/sirfz/tesserocr/issues/271#issuecomment-919334885 >>>> >>>> Did you use BeginDocument EndDocument ? >>>> >>>> Zdenko >>>> >>>> >>>> ne 21. 11. 2021 o 9:27 'blaumedia' via tesseract-ocr < >>>> tesser...@googlegroups.com> napísal(a): >>>> >>>>> Described already in issue: >>>>> https://github.com/tesseract-ocr/tesseract/issues/3652 >>>>> >>>>> I'm trying to generate a searchable PDF outgoing from a jpg image, but >>>>> the file that gets output is an invalid pdf file that can't be read by any >>>>> pdf reader. >>>>> >>>>> I have added an docker image for reproduction of the problem in the >>>>> issue, but here is the bash snippet for it: >>>>> >>>>> *git clone g...@github.com:dnnspaul/gosseract.git* >>>>> *git checkout tesseract/bug/3652* >>>>> >>>>> *docker build -t tessbug .* >>>>> *docker run -it -v $PWD/tmp:/tmp tessbug go run main.go* >>>>> >>>>> When I'm inputting the file in the tesseract cli, the outcoming pdf is >>>>> readable, but I can't find any difference between the cli and my snippet. >>>>> >>>>> Thanks in advance for any help! I'm very sorry, I'm more a GoLang >>>>> developer, than a C ++ developer so I have kind of problems with the >>>>> simplest syntax, but tried my best. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/f34562d3-d11e-4385-9c78-b24092413dean%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/f34562d3-d11e-4385-9c78-b24092413dean%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-ocr+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/ad68ab2c-2d45-47c3-9194-5d1cd8ea8400n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/ad68ab2c-2d45-47c3-9194-5d1cd8ea8400n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x%2B58UYjqq-zr0C2f%3Dazs0_RTVs%3D4p1a9PVu%2BumLOW43Q%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x%2B58UYjqq-zr0C2f%3Dazs0_RTVs%3D4p1a9PVu%2BumLOW43Q%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CABoum5OujufKc0f1jkviCN7DOmYty6mT-jZWVee-ojN4SDNfTQ%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CABoum5OujufKc0f1jkviCN7DOmYty6mT-jZWVee-ojN4SDNfTQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yZTZHFixy%2B6a3WvqOEJdMSnkaE8VnH%2Bp6Dk981Q7Febg%40mail.gmail.com.