[tesseract-ocr] Parameters to improve detection of sparse text

2023-04-24 Thread Scaly Green Orc
Hi there hello, I'm trying to OCR VA charts such as this one: (the text layer is FUBAR so I'm resorting to OCR). I'm running in sparse text mode (PSM=11). There's a lot of text but I

Re: [tesseract-ocr] Parameters to improve detection of sparse text

2023-04-25 Thread Scaly Green Orc
On Tuesday, 25 April 2023 at 09:06:20 UTC+2 zdenop wrote: First of all - this input is a regular pdf (e.g. there is text instead of an image) - IMO it should be easier to extract accurate text from the file instead of OCRing it... Next: tesseract can handle simple layout analysis (e.g. book pag