Shree is one of the most experienced; and definitely the most helpful member of this group. I have also seen Zdenko answering some questions. You might have a good luck with either of them.
On Friday, September 22, 2023 at 4:07:12 PM UTC+3 Des Bw wrote: > If you have income source, you might be able to give some compensation for > his/her time; and an experienced user or even developer might help you to > fine tune the software for your needs. You ask Shree if he/she will be > interested. > > On Friday, September 22, 2023 at 3:24:52 PM UTC+3 powe...@gmail.com wrote: > >> Well i have approximatelly 3000 customers at the moment for our software. >> We are using lots of invoices to OCR i.e. 1 customer uses approx 10.000 >> documents a month. >> So opensource is worth it. I want tesseract, sinds it is free to use. >> I believe opensource is the future. >> >> So, can somebody help me optimize it. >> >> With lots of CPU usage i mean when it needs to use more CPU for some >> parameter like "super quality". I want to use that parameter. >> >> Op vrijdag 22 september 2023 om 14:03:53 UTC+2 schreef desal...@gmail.com >> : >> >>> The CPU usage is unusual. I have pretty old mac (from 2011); have been >>> running Tesseract quite fine. >>> But, as to the accuracy, if your project is limited in scale, the >>> commercial tools would definitely perform better for you. But, if you have >>> long lasting, and extensive projects, Tesseract is worth spending your time >>> and developing (training) it. >>> >>> >>> On Friday, September 22, 2023 at 2:50:50 PM UTC+3 powe...@gmail.com >>> wrote: >>> >>>> Well, the problem is that why it chooses for: >>>> NLOO7900000B01 >>>> [image: Lambregts0001 - cleaned - btwnr.jpg] >>>> 2 times character O and 5 times a 0 (ZERO) >>>> >>>> Google vision result: "NL007900000B01" >>>> >>>> Nuance / OMNIPage: "NL007900000B01" >>>> >>>> Leadtools demo: "NL007900000B01" >>>> >>>> I want too use Tesseract, but i guess i need things like "second pass" >>>> or "preprocessing", no dictionary etc.etc.etc >>>> So, i more like a CPU usage of 99,99% and not superspeed. >>>> >>>> Can somebody help me ? >>>> >>>> Op vrijdag 22 september 2023 om 13:25:21 UTC+2 schreef >>>> desal...@gmail.com: >>>> >>>>> Apparently, version 4 doesn't support white listing. >>>>> https://groups.google.com/g/tesseract-ocr/c/IBbQIQpdSpE >>>>> That is not good. >>>>> On Friday, September 22, 2023 at 2:23:39 PM UTC+3 Des Bw wrote: >>>>> >>>>>> The difference between zero and O is deeply problematic, for the >>>>>> human eye. Some fonts make it even harder. >>>>>> You can try the method used here: >>>>>> https://pyimagesearch.com/2021/09/06/whitelisting-and-blacklisting-characters-with-tesseract-and-python/ >>>>>> if that helps. >>>>>> On Friday, September 22, 2023 at 9:43:51 AM UTC+3 powe...@gmail.com >>>>>> wrote: >>>>>> >>>>>>> I found the parameters >>>>>>> "C:\Program Files\Tesseract-OCR\tesseract.exe" "..\Lambregts0001 - >>>>>>> cleaned.jpg" "Lambregts0001 - cleaned.txt" -c >>>>>>> tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 >>>>>>> >>>>>>> :@." >>>>>>> It is not working. "uw BTW nummer:: NLOO7900000B01" >>>>>>> >>>>>>> Any other ideas ? >>>>>>> >>>>>>> Op donderdag 21 september 2023 om 22:25:12 UTC+2 schreef >>>>>>> elvi...@gmail.com: >>>>>>> >>>>>>>> White list the digits so that the O will not confuse it. >>>>>>>> >>>>>>> You can also try --psm 13 if all of your texts are single line. >>>>>>>> >>>>>>> >>>>>>>> On Thu, Sep 21, 2023, 4:07 PM A Nederpelt <powe...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi. >>>>>>>>> I am trying to use the tesseract engine instead of the nuance >>>>>>>>> engine. >>>>>>>>> When i currently use tesseract.exe the image it returns a few >>>>>>>>> strange characters. >>>>>>>>> 2x OO instead of 00 >>>>>>>>> "uw BTW nummer:: NLOO7900000B01" >>>>>>>>> instead of >>>>>>>>> "uw BTW nummer:: NL007900000B01" >>>>>>>>> and >>>>>>>>> "Tel £01" >>>>>>>>> instead of >>>>>>>>> "Tel : 01" >>>>>>>>> but "Tel : 0168-452452" is recognized ok. >>>>>>>>> >>>>>>>>> I see no optimization using >>>>>>>>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md >>>>>>>>> because it are really clean documents. >>>>>>>>> >>>>>>>>> Am i missing some parameters ? Like a second run, or more accurate >>>>>>>>> run etc. >>>>>>>>> Maybe compile tesseract.exe myself with different more quality >>>>>>>>> parameters ? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Alwin >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ce0482a9-acc1-4bb5-a575-9d6ae97fd4den%40googlegroups.com.