Re: training single pixel sign recognition

2011-05-26 Thread Dmitri Silaev
OK, my first guess was wrong. Although Tess was not designed to recognize screen fonts, especially those composed of one pixel wide strokes, it sometimes can display satisfactory results. As for punctuation, Tess often considers it noise (because of 1-3 pixel size) and discard it completely. This

Re: training single pixel sign recognition

2011-05-26 Thread joyse1
When I use Tesseract with ONE_WORD option - during box creation - tess recognizes comma, but dot and ":" doesnt. Than Im inserting boxes for those signs. And result is as You can see on attached pic ... On 26 Maj, 15:39, Joyse1 wrote: > png, box, and apply_boxes msges You will find in attachment

Re: training single pixel sign recognition

2011-05-26 Thread Joyse1
png, box, and apply_boxes msges You will find in attachment thanks in advance! I think I know, what could be the issue here. Refer to http://code.google.com/p/tesseract-ocr/issues/detail?id=446&can=5. Despite your using another layout mode, this issue can still hold true. In brief, for small

Re: training single pixel sign recognition

2011-05-26 Thread Dmitri Silaev
I think I know, what could be the issue here. Refer to http://code.google.com/p/tesseract-ocr/issues/detail?id=446&can=5. Despite your using another layout mode, this issue can still hold true. In brief, for small images Tess confuses background and foreground pixels. That's why it treats characte

Re: Language file for MICR font

2011-05-26 Thread Dmitri Silaev
Well, I can do that for you. Given that you provide me with 10-20 sample image files. One thing I can't do at the moment is to generate final language files since I abandoned Tesseract 2 long time ago. So these could be only box/tiff pairs. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu

Re: Create traineddata from different tif and box files

2011-05-26 Thread zdenko podobny
On Thu, May 26, 2011 at 2:02 PM, Sarel van der Merwe wrote: > Hi, > > Do you know where i can locate the version 3 manual or reference guide > for Tesseract.. > > The I know is in download section (tessdoc-html-3.0.0-preview1.tar.gz) ;-) Maybe Jimmi will update it for 3.01 :-) Some good informati

Re: Create traineddata from different tif and box files

2011-05-26 Thread Sarel van der Merwe
Hi, Do you know where i can locate the version 3 manual or reference guide for Tesseract.. Thanks Sarel On Thu, May 26, 2011 at 1:33 PM, zdenko podobny wrote: > Hi, > Problem is that you use the latest version and you do not read the latest > manual [1]. If I correctly understood that Germa

Re: Automate Tesseract 3.01 language data generation process

2011-05-26 Thread Mow
Thank you very mutch Quan!! I've succesfully generated my traineddata file, even if there appears some errors, they seem normal. The error was the one you mentioned, now, I'm using multi page tiffs to train tesseract. Thank you again for your help and script!! -- You received this message beca

Re: Create traineddata from different tif and box files

2011-05-26 Thread zdenko podobny
Hi, Problem is that you use the latest version and you do not read the latest manual [1]. If I correctly understood that German manual (via google translate), it is for version 3.00 so it do not follow changes in 3.01 version. Another "problem": 3.01 is not released yet. It is for developers and

training single pixel sign recognition

2011-05-26 Thread Joyse1
Hi, I have small font ( Microsoft Sans serif , 8, string to learn: " 0 1 2 3 4 5 6 7 8 9 . , : " ). I cant train single pixels recognition ( ex.: ".", "," , ":" ). I have failures when generating tr files. I have two versions of tess: with layout analizator turned on, and one_word_only option

Re: Tesseract 3.01 Training and Error opening unicharset file

2011-05-26 Thread Holm Dressler
Hi there, thanks to everybody. YES: the fullstop after combine_tessdata k05 was missing. So the right command is combine_tessdata k05. For me - in the description - it looked like the end of the sentence. Thanks to everybody, Holm PS: I really would like change the subject Tesseract 3.01 T

training single pixel sign recognition

2011-05-26 Thread Joyse1
Hi, I have small font ( Microsoft Sans serif , 8, string to learn: " 0 1 2 3 4 5 6 7 8 9 . , : " ). I cant train single pixels recognition ( ex.: ".", "," , ":" ). I have failures when generating tr files. I have two versions of tess: with layout analizator turned on, and one_word_only opti

Create traineddata from different tif and box files

2011-05-26 Thread Holm Dressler
Hi there, I am using Tesseract 3.01 under Linux. I can successfully create traineddata from one *.tif file. But combining different tif / box files give me an exception: What are the steps: Let's say I want to create a traineddata from two tif files: 01.tif and 02.tif 1. tesseract 01.tif 01 ba

Language file for MICR font

2011-05-26 Thread Hunter
Does anyone have a MICR language file they are willing to share? I need to use Tesseract 2 (via TessNet2) to read cheque details. Tesseract has a lot of difficultly reading the MICR font on the bottom of the cheque so it will need to be trained. Rather than wasting a day attempting to do this, it