Re: Localisation

2010-07-12 Thread Jeffrey Ratcliffe
On 12 July 2010 17:16, Jimmy O'Regan wrote: > Is anybody interested in seeing localisation support in Tesseract? > (Which begs the follow-up question: is anybody willing to contribute > translations for their language(s)?) I would add the support, and then upload the .pot to rosetta on launchpad.

Re: Auto training

2010-07-12 Thread Rippalka
Thanks for the answer. Even when I don't reach the "50" and I don't add it, it doesn't give any better results. I tried to detect the characters before with another language and it gave almost perfect results, so I was expecting excellent results by creeating my own language. That's true that the

Re: suggestion: change download instructions slightly

2010-07-12 Thread rogerdpack
> > A couple of suggestions, therefore: > > One option: change verbiage from > > > tesseract-2.01..tar.gz contains the language data files for > > . You need at least one of these or tesseract will not work. > > > to > > > tesseract-2.01..tar.gz contains the language data files for > > . You need a

Re: suggestion: change download instructions slightly

2010-07-12 Thread rogerdpack
> > tesseract-2.01..tar.gz contains the language data files for > > . You need at least one of these or tesseract will not work. > > > to > > > tesseract-2.01..tar.gz contains the language data files for > > . You need at least one of these or tesseract will not work. > > Note that this is labeled

unable to parse numbers?

2010-07-12 Thread rogerdpack
Hi all. re-posting this in its own thread: Overall I'm having no success getting tesseract to decode this file that has a few digits on it, in either Linux or Windows. http://myfavoritepal.com/incoming/picture10.tif I am on XP, 2.04, 2.00 eng installed. It can't tell black from grey, I assume

Re: Auto training

2010-07-12 Thread Jimmy O'Regan
On 12 July 2010 16:05, Rippalka wrote: > Hi, > > I'm currently developping an OCR application, in C#, adapted to my > type of files, to archive them on a database. > I analyse character by character, which is not in the tesseract's > philosophy. > So, i did something, not very pro, but supposed to

Localisation

2010-07-12 Thread Jimmy O'Regan
Is anybody interested in seeing localisation support in Tesseract? (Which begs the follow-up question: is anybody willing to contribute translations for their language(s)?) I'm willing to add support - it's not overly complicated, just a little tedious - but not if there's no demand. -- jimrega

Auto training

2010-07-12 Thread Rippalka
Hi, I'm currently developping an OCR application, in C#, adapted to my type of files, to archive them on a database. I analyse character by character, which is not in the tesseract's philosophy. So, i did something, not very pro, but supposed to work. For each character I add this one in the main

Re: suggestion: change download instructions slightly

2010-07-12 Thread Jimmy O'Regan
On 12 July 2010 15:28, rogerdpack wrote: > Hi all.  Small suggestion on installation verbiage: > > (XP, 2.04) > > Currently it is a bit confusing to download Tesserat v 2.0.4 and then > install a language pack that reads 2.0.1 (it's hard to find in the > long list, too--for example I downloaded th

Re: Bad Read?

2010-07-12 Thread KAH
I am confused about what to do next here... the tessnet api is giving me a word that contains the markings above the text I am looking for. I am not sure what I am doing wrong here or how to go about getting the api to return those as separate words? Thanks for any direction you can offer me with

Re: Bad Read?

2010-07-12 Thread KAH
I should add even when I use the demo that is packaged with tessnet the image returns only one line... the space is not "recognized" - I understand that there are no "newlines" returned but the tessnet API is not splitting the words. I am using the default value for the parameter tosp_table_xht_sp_

suggestion: change download instructions slightly

2010-07-12 Thread rogerdpack
Hi all. Small suggestion on installation verbiage: (XP, 2.04) Currently it is a bit confusing to download Tesserat v 2.0.4 and then install a language pack that reads 2.0.1 (it's hard to find in the long list, too--for example I downloaded the language pack for 2.0.0 by mistake). A couple of su

Re: Detect numbers only

2010-07-12 Thread rogerdpack
>         You can limit Tesseract to only detect nubmer and no letters whcih > will help you.  You can find it here > :http://code.google.com/p/tesseract-ocr/wiki/FAQ.  Look for '*How do I > recognize only digits?'* on the page. Perhaps I'm missing something... After following the FAQ direction

Re: *** glibc detected *** tesseract: double free or corruption

2010-07-12 Thread zdenko podobny
Hello, How did you installed Tesseract? Which version? Please provide more information. Zd. On Sun, Jul 11, 2010 at 6:16 PM, msjs08 wrote: > > I've installed Tesseract on Mandriva 2010 (64 bit) and I can't get it to > run. > It just segfaults. > I installed gimagereader. This is the error I go

Re: *** glibc detected *** tesseract: double free or corruption

2010-07-12 Thread Jimmy O'Regan
On 11 July 2010 17:16, msjs08 wrote: > > I've installed Tesseract on Mandriva 2010 (64 bit) and I can't get it to > run. > It just segfaults. > I installed gimagereader. This is the error I got when I tried to use > gimagereader What version are you using? If the image was generated by this gimag

Re: Pipe from xwdtotiff to tesseract

2010-07-12 Thread Jimmy O'Regan
On 11 July 2010 13:32, sj08 wrote: > I was wondering if I could pipe a screendump from xwdtotiff or some > other program straight to tesseract and then pipe the lot into one > long text file. (Or have tesseract append the text to a file) > I have a large number (4000 plus) of image slides (actuall

Re: Is it possible to get a confidence value for the tesseract OCR result?

2010-07-12 Thread Ian Ozsvald (A.I. Cookbook)
Hi Caroline. I'm thinking of using a dictionary approach coupled with varying thresholds to come up with votes for correct sentence parts. A rough sketch (for recognising English Heritage Plaques) is here: http://aicookbook.com/wiki/Automatic_plaque_transcription Basically: Try many thresholds, ex

Pipe from xwdtotiff to tesseract

2010-07-12 Thread sj08
I was wondering if I could pipe a screendump from xwdtotiff or some other program straight to tesseract and then pipe the lot into one long text file. (Or have tesseract append the text to a file) I have a large number (4000 plus) of image slides (actually individual .swf files) I want to grab the

*** glibc detected *** tesseract: double free or corruption

2010-07-12 Thread msjs08
I've installed Tesseract on Mandriva 2010 (64 bit) and I can't get it to run. It just segfaults. I installed gimagereader. This is the error I got when I tried to use gimagereader [r...@desktop test extract]# tesseract slide7.tif textfile.txt Tesseract Open Source OCR Engine *** glibc detect