On Fri, Jun 8, 2012 at 11:40 AM, Nick White <nick.wh...@durham.ac.uk> wrote:

> Hi Zdenko,
>
> I saw the descriptions you give below, I just wasn't very clear on
> what they meant.
>
> On Thu, Jun 07, 2012 at 02:50:57PM +0200, zdenko podobny wrote:
> > lang.punc-dawg
> > (Optional) A dawg made from punctuation patterns found around words. The
> > "word" part is replaced by a single space.
> > lang.number-dawg
>
> So for english, ( ) and " " spring to mind. Is this the sort of
> thing that is expected?
>
> yes. have a look at *.punc-dawg and *.punc-dawg for more examples (e.g. "
http://rapidshare.com/files/         /HITMAN.part  .rar" ;-))


> > (Optional) A dawg made from tokens which originally contained digits.
> Each
> > digit is replaced by a space character.
>
> Ah, looking at one of the official trainings with dawg2wordlist I
> see entries such as '(c)    ' (without quotes.) Thanks, that makes
> sense. Though I'm suprised (and impressed) that Tesseract goes down
> to that level of granularity in its scanning.
>
> Nick
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to