Re: [tesseract-ocr] Covering ASCII Extended range.

2014-11-18 Thread shree
0 = Orientation and script detection (OSD) only. 1 = Automatic page segmentation with OSD. 2 = Automatic page segmentation, but no OSD, or OCR. 3 = Fully automatic page segmentation, but no OSD. (Default) See whether using OSD to detect the script helps you choose the correct traineddata. On

[tesseract-ocr] Re: Need Help with extracting info from Invoice

2014-11-18 Thread Allistair C
I wonder if there is anything consistent about the invoice design? For instance I notice that your invoice has "Honda" logos on the top left and top right essentially providing 2 anchors from which you could extrapolate resolution and location/orientation of the table of data. You could also l

[tesseract-ocr] Re: Using latex to train tesseract

2014-11-18 Thread lauhlau
Hi, I am trying to do what you did. I noticed that this topic is relly old (7 years !). But I could not download your script files script1.pl and correlatebox.pl (404 not found error). Do you still have them anywhere ? Thanks in advance Le dimanche 28 octobre

Re: [tesseract-ocr] Covering ASCII Extended range.

2014-11-18 Thread Ryan Dev
Thanks again. you may get better results using appropriate language data rather than just > the ascii range. Are the client documents sorted by language? > I'm not sure how they have them organised, I just know they want an "automatic" solution... > > I am attaching files used - i had just c

[tesseract-ocr] Contents of shapetable

2014-11-18 Thread sibi kanagaraj
Dear all , I am in process of creating training data set for Tamil Languague ( Indic Script ) . I have created the shapetable using the following command . shapeclustering -F font_properties -U unicharset tam.monospaced.exp0.tr For every step before this , I was able to see the contents of t