Can someone guide me how to enable Regix inside tesseract output to capture 
certain keywords ? What is best approach to catch certain keywords and its 
corresponding values, Any advise or help really appreciated. 


Regards,

> On 1 Dec 2021, at 3:58 PM, Quang Linh <linhnq.hal...@gmail.com> wrote:
> 
> you can change int() to float() 
> 
> Vào lúc 14:32:58 UTC+7 ngày Thứ Sáu, 17 tháng 9, 2021, zdenop đã viết:
> It is not the tesseract that produces conf as string but pytesseract.
> 
> You can not convert float string to int directly in python. Check your python 
> tutorial how to do it correctly.
> 
> On Fri, 17 Sep 2021, 09:27 Brunitus Nishimura, <bruni...@gmail.com 
> <applewebdata://03981EF4-D123-4CDE-BF5B-A060F48341D6>> wrote:
> Could anyone explain why tesseract recognizes "conf" as a string ?
> import pytesseract
> import cv2
> from pytesseract import Output
>  
> pytesseract.pytesseract.tesseract_cmd = r"C:\Program 
> Files\Tesseract-OCR\tesseract.exe"
> img = 
> cv2.imread(r"C:\Users\Documents\Python\OCR\Programa\teste_manuscrito_01.jpg")
> rgb = cv2.cvtColor (img, cv2.COLOR_BGR2RGB)
>  
> resultado = pytesseract.image_to_data(rgb, lang='por', 
> output_type=Output.DICT)
>  
> print(resultado)
>  
> {'level': [1, 2, 3, 4, 5, 4, 5, 5],
> 'page_num': [1, 1, 1, 1, 1, 1, 1, 1],
> 'block_num': [0, 1, 1, 1, 1, 1, 1, 1],
> 'par_num': [0, 0, 1, 1, 1, 1, 1, 1],
> 'line_num': [0, 0, 0, 1, 1, 2, 2, 2],
> 'word_num': [0, 0, 0, 0, 1, 0, 1, 2],
> 'left': [0, 38, 38, 38, 38, 102, 102, 307],
> 'top': [0, 79, 79, 79, 79, 228, 233, 228],
> 'width': [700, 607, 607, 607, 607, 532, 77, 327],
> 'height': [400, 236, 236, 92, 92, 87, 76, 87],
> 'conf': ['-1', '-1', '-1', '-1', '90.214363', '-1', '77.749153', '61.677670'],
> 'text': ['', '', '', '', 'TESTANDO', '', 'O', 'OCR...']}
> 
> ///////////////////////////////////////////
> 
> confianca = int(resultado['conf'] [i])
> 
> ValueError: invalid literal for int() with base 10: '90.214363'
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to tesseract-oc...@googlegroups.com 
> <applewebdata://03981EF4-D123-4CDE-BF5B-A060F48341D6>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/6339b18a-95a6-4085-abff-7a0e25af4cd4n%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/tesseract-ocr/6339b18a-95a6-4085-abff-7a0e25af4cd4n%40googlegroups.com?utm_medium=email&utm_source=footer>.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to tesseract-ocr+unsubscr...@googlegroups.com 
> <mailto:tesseract-ocr+unsubscr...@googlegroups.com>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/0462e4a5-a919-4039-bf53-2c6f0d006d51n%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/tesseract-ocr/0462e4a5-a919-4039-bf53-2c6f0d006d51n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/18D1D41E-0B5F-42BB-99F1-2365830DB6EC%40gmail.com.

Reply via email to