there is a (similar) feature request:

https://github.com/tesseract-ocr/tesseract/issues/728

Zdenko


po 6. 2. 2023 o 3:57 Lars Aronsson <l...@aronsson.se> napĂ­sal(a):

> Is it possible to instruct tesseract for the image:
>
>   Let us build a snow-
>   man on the lawn.
>
> to output in txt format:
>
>   Let us build a
>   snowman on the lawn.
>
> This would almost preserve line breaks, while at
> the same time making hyphenated words whole
> and searchable.
>
> It seems to me that the source has code to recognize
> hyphenated words, and it should be possible to
> implement this behaviour as an option.
>
>
> --
>    Lars Aronsson (l...@aronsson.se)
>    Project Runeberg - free Nordic literature - http://runeberg.org/
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/2659e698-54b8-38cc-060e-db993aa0a1a6%40aronsson.se
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w5Ff21Wi4OMPQQyVKd9A5O0Ud9G14HE6X9NsxAg_Whvg%40mail.gmail.com.

Reply via email to