I found a site that uses tesseract and it does VERY well with nature and 
numbers.

When I use tesseract I do NOT get the same results that they do.
 
I’ve attached an example image that clearly I need to get 1433 for this 
team member.
When I use your OCR(https://www.imagetotext.info/) it says “*AMERICAN 
FAMILY INSURANC 1433” *which is great!
 
When I run tesseract, I get trash on the same image:
 
c:\>tesseract.exe 12749691.jpg stdout -l eng --psm 6 --oem 3
we As eee ┬╗
Ate ae
é FAS
; Z cae f .
if\ iy
i * ΓÇÖ .
| TPE
xX * dp
 
What are we doing wrong? 
Do I need to run a program before tesseract to isolate areas?
What are we missing?

Any help would be great!
 
Thanks,
 
Tim Net

On Thursday, December 1, 2022 at 2:33:58 AM UTC-5 jackf...@gmail.com wrote:

> Hi Tim, below the code
>
> Tesseract tesseract = new Tesseract();
> try {
> tesseract.setDatapath("tessdata");
> // the path of your tess data folder inside the extracted file
> String text = tesseract.doOCR(new File("c:\\temp\\TSP_12484529.JPG"));
> // path of your image file
> System.out.print(text);
> } catch (TesseractException e) {
> e.printStackTrace();
> }
>
> https://www.geeksforgeeks.org/tesseract-ocr-with-java-with-examples/
>
> Text:
>
> “linge
> — jer bed eh i
> ' ad ~m & .] i — ——
> ; om +. Yi * . ¥ ™ * :
> i q 7 OW ne ~ .~ ia ‘, " i
> : 77s 7 US we 1 oe
> Ps - a F 7. ' a
> 3 7 e si * +” " sa = : *
> | es _” nr
> es , , ;
> TH + Fo } a
> : ‘ = ee ey W 4 p =e ( \\\ 3 1
> Ea 7 * fii Y7 A (s . \ > . '
> si 7. } WH ‘.@ oh j
> : : t Y p fn we ¢ “a « a Pe,
> ca ¥v a RELAY ii ae : P yi te
> ae é a Sa "Pu
> ) > 8 oe —_ - z Z J
> . * ! a es rs
> S Q Alm» 4 , + : ‘
> & fa ts !. i
> - | so ix 2108 A As Cees
> ¥ 2 @ —— j
> a : ‘
> a ; % . + a
> a i a . * & : 4 1” 2
> “ * a - ema we q
> vg a . i = oe ,
> ad - | e < P
> , ' ee oe:
> s a a a a4 me , “ 3
> . ae : wal” +
> : | mS wip cus ot
> @ Ps él : — oe e a a -,
> ' mY —- ; —_  -F « La
> ’ , ; ~ - ~  Y _ ,
> e ; a . Y 7 . a 2 ~
> » a= ; oa , fas . ’ mi
> . ** -< ‘ » = a | ~ e : * d
> : J . S -_ , ne os e 4 reat . ¥ ;
> - f , a - & , ‘ “- c LL. : y ~. ae <
> On gn ne Ae a yA
> a v. od on ie . mm > A J a-< 5 7 Oe .
> ig ee Es 2 ee
> ft SOE ode Zi ee ae DP ee) |
> ga . s>* 4 " > ; “4 fF SR, +
> zt Jx-e cot % “i | : A Pa c L/
> ne ; ao “ \] oo TT. ¥ ae : “ J J
> Se , Sw og a 2 ~~
> i A ile PPE POP we
> : . j \ . - 7 J 4 y 4 pe Jk 4 “ae >
> ee awe ee = ot, % ~ thy i 4 é c& ie’ 4 lies > ’ ; J ©
> Ree sR rh JEP
> =. << a — > y £ wit 2 . = 5 A y _s-
> ——e t f ~ by ae? ‘ Ft wer
> x : <. >» re of Fie
> i = | s fe“ mee =. Se Le i, f
> — : _ - er | Ps Perk g
> OT Au ik Prova) SAE: eee et
> » Te eel - = As f a = - = 5 a hee De = - CS7 “g oy
> ‘ : , = ~ ( ‘ : = gre = at a VG he ws aed
> « - «?f-=- \| i = ¥ } ‘ c wry we
> - ‘ ) ‘i \ f F ae - Cue thy
> (cite i —_ 4 \ eS a Et |
> * it a= - -_ o> , _ a) ee > <—&  _ree. > =x = ws Ls he ,
> : 2 FS ae ect. - a 5
> See 2 > — ee. er LE é 2 ~~ pln 9
> Zs be ee ee, : j a eB ; Ze ; ; Z ae 3 - = a : i
> SEE - Zs Te Se ee Do 2 CEE, ae
>
> Il giorno mer 30 nov 2022 alle ore 20:09 Tim Nettleton <
> t...@truespeedphoto.com> ha scritto:
>
>> I do not understand: is tesseract not capable of doing this?
>> Is there no input or guidance that anyone can yield to get closer to a 
>> solution with tesseract?
>>
>> Tim
>>
>> On Wednesday, November 30, 2022 at 1:17:48 AM UTC-5 zdenop wrote:
>>
>>> Tesseract is an OCR engine. You need to search for " text detection from 
>>> natural scenes" e.g.:
>>>
>>> https://scholar.google.com/scholar?q=text+detection+in+natural+scenes&as_sdt=0&as_vis=1&oi=scholart
>>> https://www.sciencedirect.com/science/article/pii/S1877050922001867
>>>
>>> https://d1wqtxts1xzle7.cloudfront.net/46580188/gao_jiang_2001_2-with-cover-page-v2.pdf
>>> good luck
>>>
>>> Zdenko
>>>
>>>
>>> st 30. 11. 2022 o 7:01 Tim Nettleton <t...@truespeedphoto.com> 
>>> napísal(a):
>>>
>>>> I've tried many combinations of -psm and -l but can't extract numbers 
>>>> from JPGs.
>>>>
>>>> I have about 100 pictures of people running and they are wearing a bib 
>>>> #.
>>>> I've had a little more success if I convert to tif but not really.
>>>>
>>>> What parameters to simply extract numbers from pics like this?
>>>>
>>>> I'm a bit overwhelmed.
>>>>
>>>> Tim
>>>>
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/9614c183-f252-403e-a39f-e3bea8ab637bn%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/9614c183-f252-403e-a39f-e3bea8ab637bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2a4ee35c-e15f-46ce-a3c7-1ff73fea4457n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/2a4ee35c-e15f-46ce-a3c7-1ff73fea4457n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f898bf46-96e4-45a6-b316-e28dc0f6b24bn%40googlegroups.com.

Reply via email to