there is a (similar) feature request: https://github.com/tesseract-ocr/tesseract/issues/728
Zdenko po 6. 2. 2023 o 3:57 Lars Aronsson <l...@aronsson.se> napĂsal(a): > Is it possible to instruct tesseract for the image: > > Let us build a snow- > man on the lawn. > > to output in txt format: > > Let us build a > snowman on the lawn. > > This would almost preserve line breaks, while at > the same time making hyphenated words whole > and searchable. > > It seems to me that the source has code to recognize > hyphenated words, and it should be possible to > implement this behaviour as an option. > > > -- > Lars Aronsson (l...@aronsson.se) > Project Runeberg - free Nordic literature - http://runeberg.org/ > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/2659e698-54b8-38cc-060e-db993aa0a1a6%40aronsson.se > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w5Ff21Wi4OMPQQyVKd9A5O0Ud9G14HE6X9NsxAg_Whvg%40mail.gmail.com.