[tesseract-ocr] How to enable tessedit_write_images on pytesseract ?

2023-02-04 Thread Mars
Hello, I am in the process of figuring out how to make Tesseract more accurate, because I have some issues. In League of Legends' in-game chat the position is fixed, the text color will always be the same, but the background changes a lot. So I read how to improve quality of the output : htt

Re: [tesseract-ocr] How to enable tessedit_write_images on pytesseract ?

2023-02-04 Thread Zdenko Podobny
py-tesseract is wrapped of tesseract executable, so I suggest to use dirrecty tesseeract if something goes wrong... "tesseract --help-extra" is your friend. tessedit_write_images should be use this way: "-c tessedit_write_images=1" Zdenko so 4. 2. 2023 o 9:55 Mars napĂ­sal(a): > Hello, > > I am

Re: [tesseract-ocr] How to extract non-text regions

2023-02-04 Thread Zdenko Podobny
The task you mention is called "The document layout segmentation" or "Document layout analysis"( https://en.wikipedia.org/wiki/Document_layout_analysis) As mentioned Muneeb, you can try https://layout-parser.github.io/ and also https://github.com/qurator-spk/eynollah looks promising. I you would