Re: Training Tesseract for Bengali script problem executing cntraining

2010-04-20 Thread Tanmay Kolay
2.04 On Mon, Apr 19, 2010 at 7:14 PM, Sriranga(77yrsold) wrote: > which version of cntraining you are using? > > > On Mon, Apr 19, 2010 at 5:55 PM, Tanmay Kolay wrote: > >> When executing cntraining (cntraining file1.tr file2.tr) I'm getting >> "cnTraining.exe has encountered a problem and needs

Training for Times New Roman, only gibberish?

2010-04-20 Thread RubenK
Hello good people! I've been having a go at tesseract this week and I've come across the following issue, For some reason any document with Times New Roman (printed from Microsoft Word 2007 or WordPad from windows 7) after being OCR'd results in only gibberish characters :-( I've been following the

Re: Training Tesseract for Bengali script problem executing cntraining

2010-04-20 Thread Sriranga(77yrsold)
post your problem at http://groups.google.com/group/indic-ocr On Tue, Apr 20, 2010 at 11:50 AM, Tanmay Kolay wrote: > 2.04 > > On Mon, Apr 19, 2010 at 7:14 PM, Sriranga(77yrsold) < > withblessi...@gmail.com> wrote: > >> which version of cntraining you are using? >> >> >> On Mon, Apr 19, 2010 at

charachter set

2010-04-20 Thread Ramon
Hi, i'm using latest version from repository ( v3?) My ocr training language is catalan. I'm using spanish trainset from download page in this google group to train all characters. My word list is about 500.000 words (in fact there are 250.000 lowercase and uppercase versions of a word) and ocr w

Re: Disable Special characters?

2010-04-20 Thread Neil Benn
Hello, The main wiki page says that you do not need to specify the path to the conf files but if you scroll down to the comments then someone has added in that you do (thanks to that person!). I'm running on Linux and I do need to specify the full path to the config files rather than assu

Re: charachter set

2010-04-20 Thread Neil Benn
Hello, Go this url (http://code.google.com/p/tesseract-ocr/wiki/FAQ), look for How do I recognize only digits? That can be modified however before you try it read the comments on that wiki page as the instructions there are partly wrong and the comments have the correct comments. Cheers,

extracting line information

2010-04-20 Thread vikas landge
I am new to tesseract-ocr. I am interested in getting the line information from the image. e.g. suppose I have following data in my image Name: John Smith Age: 25 I would like to obtain the information as two separate strings from tesseract. e.g. string1:"Name: John Smith" string2: "Age: 25" I

RE: extracting line information

2010-04-20 Thread Neil Benn
Hello, Tesseract is not great at that though it will try to split two lines up a bit; Tesseract is more 'pure' character recognition. If I were you I'd look at OCRopus which can do what you are looking for. Cheers, Neil -Original Message- From: tesseract-ocr@googlegroups.com [ma