In order to improve the results, I have implemented canny edge detection and Hough Lines Transform on the images. Then I fed the binarized image to the tesseract model.
text = pytesseract.image_to_string(cropped_frame,lang='eng', config =' --psm 6 --oem 3') The results have improved a bit, but are still far from perfect. The negative symbols are being omitted, some of them are being misunderstood as ~. Similarly some decimal points are also being omitted. 22.5 was extracted as 225. On Friday, May 31, 2024 at 1:07:01 PM UTC+5:30 jun.r...@gmail.com wrote: > Its hard to give opinion withour seeing how you setup tesseract, what PSM > did you specify, .. etc? > > On Friday 31 May 2024 at 02:34:36 UTC+12 sanvib...@gmail.com wrote: > >> I have provided the image from which I am trying to extract text from, >> using tesseract ocr (input.jpeg). Along with that, I have also provided the >> result or the extracted text from the image. As it can be observed from the >> images, the extracted text is not very accurate. Negative symbols have been >> omitted, some undesired characters are also there in the extracted text. (I >> have marked some of the incorrect results with blue boxes) >> >> I have tried to improve the results by preprocessing and bringing changes >> in the parameters of the model. I have tried: >> >> 1. Binarizing the images >> >> 2. HDR processing of the processes >> >> Even then, such inconsistencies remain. >> >> How to improve the detection and extraction of text in tesseract? I have >> also tried paddleocr for the same task. Even then, symbols such as euro, >> some negative signs are not being detected. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1e3692c2-d38a-46bb-b228-897851c3c8fcn%40googlegroups.com.