Thanks a lot Zdenko, I am disappointed but th'as life :-( Le lundi 22 novembre 2021 à 12:42:23 UTC+1, zdenop a écrit :
> OCR of source code with tesseract is a problem: > > - tesseract is not focused on keeping spaces/indentation - you have to > reconstruct it by yourself (e.g. by parsing horcr output) > - tesseract is focused more on "real" text, while source code is more > symbolic with a lot of extra character, case sensitive etc. So I am quite > sure you will need to correct the tesseract output manually. > > > Zdenko > > > po 22. 11. 2021 o 6:54 J S <jszal...@gmail.com> napísal(a): > >> Hi all, >> I am trying to OCR some code wrote in Python. I ve read the Tesseract doc >> many times and applied 3 pre processing script with Image Magick. The >> result image is attached. >> I then send it to Tesseract with ```--psm 4``` which seems to be the more >> adapted segmentation mode for what I am trying to do. The result is quite >> ok but I don't have indentations and I think it could be still improved. >> >> I would be glad to have some adivce to improve the result. Thanks a lot >> >> Best, >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/c07b4f66-7e6e-4634-a4ee-b8a8db003f20n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/c07b4f66-7e6e-4634-a4ee-b8a8db003f20n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0c0402b1-1503-4795-a1bd-8598a83e9bfan%40googlegroups.com.