Re: [tesseract-ocr] Train for big letters in the beginning of the sentences(pic)

2020-08-06 Thread tlit...@gmail.com
Okay, I see. Very interesting articles, thank you. Since I don't know any other method for line segmentation I used hocr output from tesseract than I used hocr-tools, I dug that out on some older GitHub issues and that's how I generated line images for ground truth. Than I manually checked about

Re: [tesseract-ocr] Train for big letters in the beginning of the sentences(pic)

2020-08-05 Thread Tom Morris
The technical term for these is "drop-caps ," which is useful to know if you want to Google for it. It's pretty dated now, but Ray's 2007 description of the line finding algorithm says: "Assumi

Re: [tesseract-ocr] Train for big letters in the beginning of the sentences(pic)

2020-08-05 Thread tlit...@gmail.com
That's right, that initial "TO" and this is just a fraction of the text, there are dozens of examples like "TO" on a single page. But since it spreads to two lines there's nothing I can do I assume? On Tuesday, August 4, 2020 at 7:39:21 PM UTC+2 zdenop wrote: > Not sure what do you mean... > >

Re: [tesseract-ocr] Train for big letters in the beginning of the sentences(pic)

2020-08-04 Thread Zdenko Podobny
Not sure what do you mean... tesseract big_low.jpeg - --psm 6 Warning: Invalid resolution 0 dpi. Using 70 instead. FY, MINERS.—TO LET, ON LEASE, on such terms as may be agreed on, the MINERALS in the ESTATE of KNOCKSHINNOCK, lying in the parish of New Cumnock, and county of Ayr. Acdead vein has be

[tesseract-ocr] Train for big letters in the beginning of the sentences(pic)

2020-08-04 Thread tlit...@gmail.com
Hello, Is it possible to train for bigger fonts in the beginning of the sentences, since it seems that tesseract always misses them. Thanks in advance. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop r