I found a site that uses tesseract and it does VERY well with nature and numbers.
When I use tesseract I do NOT get the same results that they do. I’ve attached an example image that clearly I need to get 1433 for this team member. When I use your OCR(https://www.imagetotext.info/) it says “*AMERICAN FAMILY INSURANC 1433” *which is great! When I run tesseract, I get trash on the same image: c:\>tesseract.exe 12749691.jpg stdout -l eng --psm 6 --oem 3 we As eee ┬╗ Ate ae ├⌐ FAS ; Z cae f . if\ iy i * ΓÇÖ . | TPE xX * dp What are we doing wrong? Do I need to run a program before tesseract to isolate areas? What are we missing? Any help would be great! Thanks, Tim Net On Thursday, December 1, 2022 at 2:33:58 AM UTC-5 jackf...@gmail.com wrote: > Hi Tim, below the code > > Tesseract tesseract = new Tesseract(); > try { > tesseract.setDatapath("tessdata"); > // the path of your tess data folder inside the extracted file > String text = tesseract.doOCR(new File("c:\\temp\\TSP_12484529.JPG")); > // path of your image file > System.out.print(text); > } catch (TesseractException e) { > e.printStackTrace(); > } > > https://www.geeksforgeeks.org/tesseract-ocr-with-java-with-examples/ > > Text: > > “linge > — jer bed eh i > ' ad ~m & .] i — —— > ; om +. Yi * . ¥ ™ * : > i q 7 OW ne ~ .~ ia ‘, " i > : 77s 7 US we 1 oe > Ps - a F 7. ' a > 3 7 e si * +” " sa = : * > | es _” nr > es , , ; > TH + Fo } a > : ‘ = ee ey W 4 p =e ( \\\ 3 1 > Ea 7 * fii Y7 A (s . \ > . ' > si 7. } WH ‘.@ oh j > : : t Y p fn we ¢ “a « a Pe, > ca ¥v a RELAY ii ae : P yi te > ae é a Sa "Pu > ) > 8 oe —_ - z Z J > . * ! a es rs > S Q Alm» 4 , + : ‘ > & fa ts !. i > - | so ix 2108 A As Cees > ¥ 2 @ —— j > a : ‘ > a ; % . + a > a i a . * & : 4 1” 2 > “ * a - ema we q > vg a . i = oe , > ad - | e < P > , ' ee oe: > s a a a a4 me , “ 3 > . ae : wal” + > : | mS wip cus ot > @ Ps él : — oe e a a -, > ' mY —- ; —_ -F « La > ’ , ; ~ - ~ Y _ , > e ; a . Y 7 . a 2 ~ > » a= ; oa , fas . ’ mi > . ** -< ‘ » = a | ~ e : * d > : J . S -_ , ne os e 4 reat . ¥ ; > - f , a - & , ‘ “- c LL. : y ~. ae < > On gn ne Ae a yA > a v. od on ie . mm > A J a-< 5 7 Oe . > ig ee Es 2 ee > ft SOE ode Zi ee ae DP ee) | > ga . s>* 4 " > ; “4 fF SR, + > zt Jx-e cot % “i | : A Pa c L/ > ne ; ao “ \] oo TT. ¥ ae : “ J J > Se , Sw og a 2 ~~ > i A ile PPE POP we > : . j \ . - 7 J 4 y 4 pe Jk 4 “ae > > ee awe ee = ot, % ~ thy i 4 é c& ie’ 4 lies > ’ ; J © > Ree sR rh JEP > =. << a — > y £ wit 2 . = 5 A y _s- > ——e t f ~ by ae? ‘ Ft wer > x : <. >» re of Fie > i = | s fe“ mee =. Se Le i, f > — : _ - er | Ps Perk g > OT Au ik Prova) SAE: eee et > » Te eel - = As f a = - = 5 a hee De = - CS7 “g oy > ‘ : , = ~ ( ‘ : = gre = at a VG he ws aed > « - «?f-=- \| i = ¥ } ‘ c wry we > - ‘ ) ‘i \ f F ae - Cue thy > (cite i —_ 4 \ eS a Et | > * it a= - -_ o> , _ a) ee > <—& _ree. > =x = ws Ls he , > : 2 FS ae ect. - a 5 > See 2 > — ee. er LE é 2 ~~ pln 9 > Zs be ee ee, : j a eB ; Ze ; ; Z ae 3 - = a : i > SEE - Zs Te Se ee Do 2 CEE, ae > > Il giorno mer 30 nov 2022 alle ore 20:09 Tim Nettleton < > t...@truespeedphoto.com> ha scritto: > >> I do not understand: is tesseract not capable of doing this? >> Is there no input or guidance that anyone can yield to get closer to a >> solution with tesseract? >> >> Tim >> >> On Wednesday, November 30, 2022 at 1:17:48 AM UTC-5 zdenop wrote: >> >>> Tesseract is an OCR engine. You need to search for " text detection from >>> natural scenes" e.g.: >>> >>> https://scholar.google.com/scholar?q=text+detection+in+natural+scenes&as_sdt=0&as_vis=1&oi=scholart >>> https://www.sciencedirect.com/science/article/pii/S1877050922001867 >>> >>> https://d1wqtxts1xzle7.cloudfront.net/46580188/gao_jiang_2001_2-with-cover-page-v2.pdf >>> good luck >>> >>> Zdenko >>> >>> >>> st 30. 11. 2022 o 7:01 Tim Nettleton <t...@truespeedphoto.com> >>> napísal(a): >>> >>>> I've tried many combinations of -psm and -l but can't extract numbers >>>> from JPGs. >>>> >>>> I have about 100 pictures of people running and they are wearing a bib >>>> #. >>>> I've had a little more success if I convert to tif but not really. >>>> >>>> What parameters to simply extract numbers from pics like this? >>>> >>>> I'm a bit overwhelmed. >>>> >>>> Tim >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/9614c183-f252-403e-a39f-e3bea8ab637bn%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/9614c183-f252-403e-a39f-e3bea8ab637bn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2a4ee35c-e15f-46ce-a3c7-1ff73fea4457n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2a4ee35c-e15f-46ce-a3c7-1ff73fea4457n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f898bf46-96e4-45a6-b316-e28dc0f6b24bn%40googlegroups.com.