On Thursday, February 20, 2025 at 11:38:34 PM UTC-5 dooha...@gmail.com wrote:
*Understanding Tesseract OCR and --psm: Why Removing It Can Improve Accuracy for Scanned Books* The recommendation to understand the operation of various page segment modes and to experiment with alternatives are good ones, but "removing" the --psm switch is equivalent to using *--psm 3*, which is the default. You can get an overview of all the different page segmentation modes by using *--help-psm*: $ tesseract --help-psm Page segmentation modes: 0 Orientation and script detection (OSD) only. 1 Automatic page segmentation with OSD. 2 Automatic page segmentation, but no OSD, or OCR. (not implemented) 3 Fully automatic page segmentation, but no OSD. (Default) 4 Assume a single column of text of variable sizes. 5 Assume a single uniform block of vertically aligned text. 6 Assume a single uniform block of text. 7 Treat the image as a single text line. 8 Treat the image as a single word. 9 Treat the image as a single word in a circle. 10 Treat the image as a single character. 11 Sparse text. Find as much text as possible in no particular order. 12 Sparse text with OSD. 13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/33bf41b1-fe0b-4b56-a381-09a1b2c03a31n%40googlegroups.com.