Hi all, I publish my test / example how to use tesseract C-API in python3 via cffi[1].
I am aware of pytesseract module, which seems to be widely used. It is wrapping tesseract executable, so IMO it could have some limitation e.g. from point of performance (it using disk operation for input and output). It is in form of jupyter notebook[3] (github is able to show it, but not run ;-)) so you can interactively view what is happening. My aim is not to create new tesseract python wrapper (I do not have a time for it, and I am not able to create nice python code as pytesseract has :-) ) so it is not robust: I just did it on windows 64 bit, but IMO is should be possible with small modification to use in Linux and Mac. If needed I can add 32bit windows libs... Personally I would like have python tesseract and leptonica module using directly its API... I know that James Barlow already started to wrapping leptonica, but it is (not yet?) available as independent module (it is part of OCRmyPDF). Anyway I hope this will help somebody. [1] https://github.com/zdenop/SimpleTesseractPythonWrapper [2] https://pypi.org/project/pytesseract/ [3] https://github.com/zdenop/SimpleTesseractPythonWrapper/blob/master/SimpleTesseractPythonWrapper.ipynb [4] https://github.com/jbarlow83/OCRmyPDF/tree/master/src/ocrmypdf/lib Zdenko -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xKHJ0n%3DKUtkfOWcLGg2_R6%2BEmdhT3Fif_J0fhN6gaKbg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.