You are correct, I was able to resolve this by using these two page 
segmentation modes:
- PSM 6 (single uniform block)  
- PSM 4 (single column variable sizes)

I use Tesseract with python and ran into this issue with both  
pytesseract.image_to_data 
and  pytesseract.image_to_text commands with version 5.2 of Tesseract.

Thanks

On Wednesday, October 23, 2024 at 10:42:25 AM UTC-4 tfmo...@gmail.com wrote:

> On Wednesday, October 23, 2024 at 1:13:05 AM UTC-4 mattjo...@gmail.com 
> wrote:
>
> I am having an issue with Tesseract splitting text lines incorrectly for 
> the attached file of a metes and bounds legal description.  It returns this:
>
> [...]
>
> Any ideas on how to fix this?
>
>
> It would be helpful if you included the version you are using, language 
> model, the command line, etc.
>
> The most likely fix is to use a different page segmentation mode on the 
> command line.
>
> Tom 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/80816da1-9e87-470d-9867-9f166b698d50n%40googlegroups.com.

Reply via email to