Re: Training the Tesseract-OCR for Kannada Language

2012-05-27 Thread sridhar n
Hello Mr. Rao, Can u please send the trainneddata file to me as I am stuck. I am also stuck with a problem where i have the tesseract reading the text but the output sent out from the engine in the text file is some special characters.. I am unable to tell the tesseract engine to write the output

Re: Android version: is possible to get the bounding boxes of the recognized words

2012-05-27 Thread Joe Aspara
I think I've solved using the tess-two version of tesseract, which contains four new native methods: - TessBaseAPI::GetRegions() - TessBaseAPI::GetTextlines() - TessBaseAPI::GetWords() - TessBaseAPI::GetCharacters() For further info, visit: https://github.com/rmtheis/tess-two Il g

Android version: is possible to get the bounding boxes of the recognized words

2012-05-27 Thread Joe Aspara
Hi you all, I'm searching for a way to getting the bounding box coordinates of the characters in the tesseract-android-tool version. In the native API interface (TessBaseAPI.java) there isn't the native method for static int TesseractExtractResult(char** string, int** lengths, float** costs, i

Re: Training the Tesseract-OCR for Kannada Language

2012-05-27 Thread mns_rao
On May 22, 3:53 pm, sri1683 wrote: > hi taha, > > thanks for the suggestion.. > i have used 6 tif images for training.. > thats what drove me to think that the traineddata file should be > bigger.. > > On May 22, 3:35 pm, Taha Alasli wrote: > > > > > > > > > I think that size of the traineddata

Re: Creating searchable pdf with tesseract and pdfbeads

2012-05-27 Thread Jeffrey Ratcliffe
On 27 May 2012 15:34, Zdenko Podobný wrote: >> Made with scanTailor, jbigenc, pdfbeads and Tess3.01. You can do this in one step with gscan2pdf[1] - which uses Tesseract for the OCR. Regards Jeff [1] http://gscan2pdf.sourceforge.net/ -- You received this message because you are subscribed to

Re: Tess3.01 hocr output not working with pdfbeads

2012-05-27 Thread Zdenko Podobný
Well, thanks should go to David who fix the code and Galt who reported/test it. My problem (excluding lack of time;-) ) there is no working hocr validity tool. hocr-tools[1] has something but it looks to have problem with recent python PyXML[2] (I just did quick test). I saw some attempts that rep

Creating searchable pdf with tesseract and pdfbeads

2012-05-27 Thread Zdenko Podobný
Maybe you can write a blog (then post link to forum ;-) ) about work-flow (needed changes, spent time at each step etc.) This could be useful also for non tesseract communities. -- Zdenko Dňa 26.05.2012 09:01, Galt wrote / napísal(a): > Here's my pdf if anyone is interested: > > http://folkpla

Re: Training tesseract

2012-05-27 Thread zdenko podobny
Just small correction: tesseract-ocr 3.0x did not use libtiff directly, but via leptonica. -- Zdenko On Sun, May 27, 2012 at 12:25 PM, Stane wrote: > 1. > Once litiff is properly installed you shouldn't get any problems later > on. > An alternative to the multipage things is to have each page

Re: Page layout analysis - don't split columns.

2012-05-27 Thread zdenko podobny
in 3.0x you can set page segmentation mode (search for SetPageSegMode or variable "tessedit_pageseg_mode"). I think proper mode should help you. If I remember correctly, that was report here at forum, who to compile current tesseract for android. -- Zdenko On Sun, May 27, 2012 at 12:06 PM, Joe

Re: Training tesseract

2012-05-27 Thread Stane
1. Once litiff is properly installed you shouldn't get any problems later on. An alternative to the multipage things is to have each page as a single tiff file, numbered through. like: [lang].[fontname].exp[num] 2. Not sure how important the fitting of the bounding box around each character is, bu

Re: Character recognition based on location on image / document

2012-05-27 Thread Stane
Tesseract doesn't have any real postprocessing yet. Which means you will get the word in the same order as you input them. you have to handle the word order changing yourself. Maybe if the output of tesseract is good enough with all the comma in the right place. you could try to use them as a sepa

Re: Page layout analysis - don't split columns.

2012-05-27 Thread Joe Aspara
Thanks Broke, unfortunately I must use the android tesseract version so I need to find a programmatically solution to this problem. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups

Re: Need Help with my C# Wrapper

2012-05-27 Thread ruwanthaka
Hi EricD, Can you please share your source code with me, hope this is related C# VS2010, my target is develop a OCR application for language "Sinhala" which use in my country. i'm trying with general guides but unable to get success. Thanks in advance. Ruwanthaka -- You received this message

Re: Page layout analysis - don't split columns.

2012-05-27 Thread Brock Henry
Joe, I got over my problem, though I don't remember how. I think I updated to the latest svn version, and no longer had the problem. On Sunday, 27 May 2012, Joe Aspara wrote: > I have the same problem reported by Brock. Anyone has a solution to force tesseract to read one line at time ignoring