Re: Custom Wordlist without Retraining

2011-05-09 Thread Max Cantor
Ok, i feel a bit less bad now. combine_tessdata segfaults on both ubuntu and osx: 182:tess max$ combine_tessdata -u eng.traineddata eng Extracting tessdata components from eng.traineddata tesseract::TessdataManager::TessdataTypeFromFileName( filename, &type, &text_file):Error:Assert failed:in f

Last time, I promise! (was: Custom Wordlist without Retraining)

2011-05-09 Thread Max Cantor
Ok, I found the problem. the fix is described here: http://code.google.com/p/tesseract-ocr/issues/detail?id=356 the output dir needs to end in a period. my bad. max On May 9, 2011, at 3:30 PM, zdenko podobny wrote: > no problem :-) I think you will like option "-o" too. > > Zdenko > >

Re: Difficulties to use Tesseract

2011-05-09 Thread Patrick Collins
That's weird, I find tesseract works better with 150dpi. I can never get it to return meaningful results at 300dpi. Maybe it is must my documents? Or maybe I need to force them to grayscale? They are color documents (but all black and white anyway). On 9 May 2011 17:00, Quan Nguyen wrote: > Did

How to improve the OCR accuracy when the character closeness is small?

2011-05-09 Thread mw18888
In testing the Tessercat x, we see that the software ocr accuracy will decrease under two conditions: a. the text lines are adjacent to each other (Or characters are vertically adjacent to each other.) b. the text characters are horizontally adjacent to each other. I wonder if there is tesseract

Re: Difficulties to use Tesseract

2011-05-09 Thread Quan Nguyen
Did you scan them correctly, with appropriate pixel resolution (~300 DPI) and monochrome/grayscale settings? On May 9, 10:20 am, Giby_the_kid wrote: > I've test with the sample of text in the sources... it has worked... > Now if I tried with any other scanned document, I get an empty text > file.

Re: Setting up Tessnet for .Net application

2011-05-09 Thread Quan Nguyen
Take a look at the source code of VietOCR.NET, which uses tessnet2 library. http://vietocr.sf.net On May 9, 10:08 am, Vignesh Raj wrote: > Hi. Am very new to this and I need some help on how to set up tessnet > for my .Net (c#) based application. > I have not done anything yet and any link on th

Re: Difficulties to use Tesseract

2011-05-09 Thread Giby_the_kid
I've test with the sample of text in the sources... it has worked... Now if I tried with any other scanned document, I get an empty text file. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@goo

Setting up Tessnet for .Net application

2011-05-09 Thread Vignesh Raj
Hi. Am very new to this and I need some help on how to set up tessnet for my .Net (c#) based application. I have not done anything yet and any link on the basic study will be very helpful. I have gone through "http://www.pixel-technology.com/freeware/ tessnet2/", but could not find more info on the

AW: Best way to detect german mutated vowel (ü, ä, ö)

2011-05-09 Thread Lutz, Michael
Thanks for the information. Now to your question. I use the UNLV format cause I output the string to a simple edit control (text box) in windows which supports ISO-8859-1 encoded strings I think since special UTF-8 characters look funny such as äöü. Since I found no way how to get a UTF-8 support

Re: Custom Wordlist without Retraining

2011-05-09 Thread zdenko podobny
no problem :-) I think you will like option "-o" too. Zdenko On Mon, May 9, 2011 at 8:27 AM, Max Cantor wrote: > I feel really dumb now. Sorry for the bother. > > > Thanks, max > > On May 9, 2011, at 14:01, zdenko podobny wrote: > > Please try to read (to look is not enough ;-) ) [1] : > > //

Re: Custom Wordlist without Retraining

2011-05-09 Thread Oleg Tikhonov
Hi Max, Look at: Extracts all component files from .traineddata combine_tessdata -u tessdata/ell.traineddata /home/$USER/temp/ell combine_tessdata language_data_path_prefix (e.g. tessdata/eng.) Combines all individual tessdata components (unicharset, DAWGs, classifier templates, ambiguities, lan

Re: Custom Wordlist without Retraining

2011-05-09 Thread Max Cantor
I feel really dumb now. Sorry for the bother. Thanks, max On May 9, 2011, at 14:01, zdenko podobny wrote: > Please try to read (to look is not enough ;-) ) [1] : > > > // Specify option -u to unpack all the components to the specified path: > // > > > // combine_tessdata -u tessdata/eng.tr