On Thursday, February 20, 2025 at 11:38:34 PM UTC-5 dooha...@gmail.com 
wrote:

*Understanding Tesseract OCR and --psm: Why Removing It Can Improve 
Accuracy for Scanned Books*

The recommendation to understand the operation of various page segment 
modes and to experiment with alternatives are good ones, but "removing" the 
--psm switch is equivalent to using *--psm 3*, which is the default. 

You can get an overview of all the different page segmentation modes by 
using *--help-psm*:

$ tesseract --help-psm
Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.




-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/33bf41b1-fe0b-4b56-a381-09a1b2c03a31n%40googlegroups.com.

Reply via email to