On Fri, Apr 22, 2016 at 12:27 AM, S.J. Becker <scottbecke...@gmail.com> wrote:
> > I just did more testing. > > My one word or single character image works with > -psm 7 > -psm 8 > > my two or three lines of text image works with the default of > -psm 3 > as well as > -psm 4 > > They both seem to work with > -psm 6 > > I may have to go with 6 even though my three line test with different > font sizes should be done with 4 based on it's description. > > I feel it's a bug that 3 and 4 can't reliably handle simpler content. > To get the most out of Tesseract, I must analyze the segmentation?! > Why analyze? Don't you know in advance if you are asking to OCR page or just paragraph, line or word??? > > That is why I had to go through the trouble of compiling leptonica; > so that tesseract is smart enough that I don't have to re-invent the wheel. > Tesseract use leptonica as dependancy so it does not need to re-invent the wheel. > > It seems that it's failing at the segmentation stage. If it finds nothing > it could try again automatically with a more primitive setting. That is > way more efficient than my process spawning tesseract twice as often. > > thanks > scott > > On Thursday, April 21, 2016 at 4:21:47 AM UTC-7, zdenop wrote: >> >> Please read the wiki >> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method >> >> Zdenko >> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/e9f5cb1a-374f-49b6-82ef-795b009e0180%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/e9f5cb1a-374f-49b6-82ef-795b009e0180%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8z%3DMXud_HpdEVp-2%2BU%3DpHucH_%3DBSPx1wPSFiseuAmSB2A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.