[tesseract-ocr] Re: Tesseract OCR 4.0.0 Alpha how to train a new font

2017-09-04 Thread shree
Try san_latn.traineddata from https://github.com/Shreeshrii/tessdata4alpha/tree/master/best On Tuesday, August 29, 2017 at 12:19:10 PM UTC+5:30, Anand Akella wrote: > > Hi, > Im new to tesseract and have a pdf

RE: [tesseract-ocr] Image preprocessing

2017-09-04 Thread Art Rhyno .
Hi Lada, Since you are already using opencv, it might be an option for extracting the region based on shape. Adrian Rosebrock has an example of shape detection with opencv here [1]. I put together a simple example [2] in python that extracts the biggest rectangle, which would be the area with t

Re: [tesseract-ocr] Digits only for tesseract4

2017-09-04 Thread ShreeDevi Kumar
Tesseract 4 does not honor whitelist, digits etc. Use an older version such as 3.02, 3.04. On 05-Sep-2017 12:24 AM, "Declic73" wrote: > Hello, > > I am trying tesseract 4 for a project to read digits on different > surfaces. > > currently I invoke tesseract with the following options : > --oem 1

[tesseract-ocr] unicharset and boxfile for tesseract 4

2017-09-04 Thread Ava Nimaee
Hi i want know about unicharset and box file in tesseract 4 for RTL script. i trained but the result is not good.can anyone give me the link about it?and also xheight -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this gro

[tesseract-ocr] Re: use two language in tesseract

2017-09-04 Thread Ava Nimaee
using -l fas +eng is perfect for language both are LTR or RTL. if you have LTR and RTL in your text, the result will not be perfect On Monday, September 4, 2017 at 9:02:47 AM UTC+4:30, peiman F. wrote: > > @reza > this is a known problem and will be resolved at the next traineddata > generation

[tesseract-ocr] Digits only for tesseract4

2017-09-04 Thread Declic73
Hello, I am trying tesseract 4 for a project to read digits on different surfaces. currently I invoke tesseract with the following options : --oem 1 -l eng -c tessedit_char_whitelist=0123456789 digits when using the "best" testdata (from https://github.com/tesseract-ocr/tessdata/tree/master/be