if I didn't research how would I know Tesseract needs image processing? I 
am new to OCR and in the learning phase please be kind and help thanks :)   

On Saturday, January 27, 2024 at 3:26:40 PM UTC+5 zdenop wrote:

> What about reading docs and a little bit googling?
>
> tesseract two-page-passport-mrz-detected.jpeg - --psm 6 -l mrz
>
> IDAUT10000999<6<<<<<<<<<<<<<<<
> 7109094F1112315AUT<<<<<<<<<<<6
> MUSTERFRAU<<ISOLDE<<<<<<<<<<<<
>
>
> Zdenko
>
>
> so 27. 1. 2024 o 11:19 sara waheed <sarawah...@gmail.com> napísal(a):
>
>> I am trying to read the passport mrz string from the image i am using 
>> Tesseract and OpenCV for image processing i have tried three different ways 
>>  none of them worked 
>>
>> **Attempt 1**
>> I have this image  when i do ocr on it teseract read as 
>>
>>     IDAUT10000999<6<<<<<<<<<<<<<<<
>>     7109094F1112315AUT<<<<<<xcc<<6
>>     MUSTERFRAU<<ISOLDE<<<<<<<<cc<<
>>
>> which is incorrect it treats <<< as x or c or k when I use the `mrz-java` 
>> library to read the details from the string it gives the following error 
>>
>>     [error] Error parsing MRZ string: Failed to parse MRZ MRTD_TD1 
>> IDAUT10000999<6<<<<<<<<<<<<<<<
>>     [error] 7109094F1112315AUT<<<<<<xcc<<6
>>     [error] MUSTERFRAU<<ISOLDE<<<<<<<<cc<<
>>     [error]  at 24-25,1: Invalid character in MRZ record: x
>>
>> **Attempt 2**
>>
>> then I converted the image to grayscale and binarized it using `OpenCV` 
>> Here is the below code 
>>
>>         val roiImagePath = 
>> "src/main/resources/ocr/passport/two-page-passport-mrz-detected.jpeg"
>>         
>>         val grayScaleROI = new Mat()
>>           val roiImage = Imgcodecs.imread(roiImagePath)
>>           Imgproc.cvtColor(roiImage, grayScaleROI, Imgproc.COLOR_BGR2GRAY)
>>           val roiGaryImagePath = 
>> "src/main/resources/ocr/passport/two-page-passport-mrz-detected-gray.jpeg"
>>         
>>           Imgcodecs.imwrite(roiGaryImagePath, grayScaleROI)
>>           val binary = new Mat()
>>           Imgproc.adaptiveThreshold(grayScaleROI, binary, 255, 
>> Imgproc.ADAPTIVE_THRESH_MEAN_C, Imgproc.THRESH_BINARY , 15, 25)
>>           val roiBinaryImagePath = 
>> "src/main/resources/ocr/passport/two-page-passport-mrz-detected-binary.jpeg"
>>           Imgcodecs.imwrite(roiBinaryImagePath, binary)
>>     
>>      val tesseract = new Tesseract()
>>       tesseract.setDatapath("/usr/share/tesseract-ocr/4.00/tessdata")
>>       tesseract.setVariable("user_defined_dpi", "600")
>>       val result = tesseract.doOCR(new File(roiBinaryImagePath))
>>       val mrzStr = result.replace(" ", "")
>>       println(s"two page passport mrz string is: "+mrzStr)
>>
>> it created the following binary image
>>
>> and the code output is 
>> tesseract reads mrz string from the binary image as 
>>
>>     IDAUT1DODD999<E<KK<KKKKEKEKEK
>>     7AD9D9GF1TEZSISAUTKKKKKKKKKEKG
>>     MUSTERFRAUSKISOLDEKKKKKKKKKKK
>> and `mrz-java` reads the string and generates the following error 
>>
>>     [error] Error parsing MRZ string: Failed to parse MRZ null 
>> IDAUT1DODD999<E<KK<KKKKEKEKEK
>>     [error] 7AD9D9GF1TEZSISAUTKKKKKKKKKEKG
>>     [error] MUSTERFRAUSKISOLDEKKKKKKKKKKK
>>     [error]  at 0-0,0: Different row lengths: 0: 29 and 1: 30
>>
>> **Attempt 3**
>>
>> then I resized the image 
>>
>>     Val width = 1000 // Increase width proportionately (adjust based on 
>> your needs)
>>       val height = (width * binary.rows()) / binary.cols() // Maintain 
>> aspect ratio
>>     
>>       val resizedRoiImage = new Mat()
>>       Imgproc.resize(binary, resizedRoiImage, new Size(width, height), 
>> 0.0, 0.0, Imgproc.INTER_NEAREST)
>>     
>>       val resizedImageROIPath = 
>>  
>> "src/main/resources/ocr/passport/two-page-passport-mrz-detected-binary-resized_image.jpg"
>>       Imgcodecs.imwrite(resizedImageROIPath, resizedRoiImage)
>>
>> mrz string read by Tesseract
>>
>>     TOAUTIOOOOIISKhcceccccddddddce
>>     FIOPOSAFIFESSISAUTReececeececs
>>     MUSTERFRAUCCKISOLDECKccccdcddd
>>
>> and the error is 
>>
>>     [info] 15:54:04.200 633 [main] MrzParser INFO - Check digit 
>> verification failed for document number: expected 0 but got h
>>     [error] Error parsing MRZ string: Failed to parse MRZ MRTD_TD1 
>> TOAUTIOOOOIISKhcceccccddddddce
>>     [error] FIOPOSAFIFESSISAUTReececeececs
>>     [error] MUSTERFRAUCCKISOLDECKccccdcddd
>>     [error]  at 15-16,0: Invalid character in MRZ record: c
>>
>>   
>> can anyone please help how I read the text properly also I have tried one 
>> regex to convert c or k back to <<< it did not work either if anyone can 
>> suggest some workaround or any improvement in code please help me with that 
>> thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/440788ab-1d76-4612-a4b5-a1a4c2cd09a5n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/440788ab-1d76-4612-a4b5-a1a4c2cd09a5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1bf9839e-93e4-4fcc-818a-c4184ebb58d1n%40googlegroups.com.

Reply via email to