Re: Language file for MICR font

2011-06-01 Thread Hunter
[I keep posting this, but it doesn't show up after 1 day. Attempt #4] I only have a limited number of samples, but that seems to be enough for now (100% detection rate for everything i have). If I can collate a few more then i shall retrain and post the updated language files and maybe a T3 compil

Re: Language file for MICR font

2011-06-01 Thread Hunter
[I keep posting this, but it doesn't show up after several hours. Removing the link to see...] I only have a limited number of samples, but that seems to be enough for now (100% detection rate for everything i have). If I can collate a few more then i shall retrain and post the updated language fi

Re: Language file for MICR font

2011-06-01 Thread Hunter
I only have a limited number of samples, but that seems to be enough for now (100% detection rate for everything i have). If I can collate a few more then i shall retrain and post the updated language files and maybe a T3 compiled file. I have posted the source files and the compile T2 language fi

DAWG in sync?

2011-06-01 Thread Davide Morellato
In TrainingTesseract3's wiki is written: "NOTE The unicharset file must be regenerated whenever inttemp, normproto and pffmtable are generated (i.e. they must all be recreated when the box file is changed) as they have to be in sync". Question is: must DAWG files be in sync, too? Or can I use DAWG

Re: Language file for MICR font

2011-06-01 Thread Hunter
I only have a limited number of samples, but that seems to be enough for now (100% detection rate for everything i have). If I can collate a few more then i shall retrain and post the updated language files and maybe a T3 compiled file. I have posted the source files and the compile T2 language fi

Re: Recognizing short strings like "1-A"

2011-06-01 Thread Joyse1
W dniu 2011-05-27 14:30, Utkarsh pisze: After manipulating an image, I'm sending a very short string to tesseract. The image has strings like "1-A", "3-F",etc. The problem is it cannot detect these alphabets on their own. I tried adding a dummy word after it (to make it look like "1-A test") and

Re: Preprocessing / Colored Background and DotMatrix Printer

2011-06-01 Thread Erik Reisig
Hi Dmitri, thank you very much for your hint. I retrained the Engine with the appropriate font and did thresholding with 80% on my target zone. Additionally I used a custom dictonary with the drugs that can be prescribed. Now i'm reaching nearly 100% accuracy in OCR in my specific target zone. Bi

Re: Preprocessing / Colored Background and DotMatrix Printer

2011-06-01 Thread Dmitri Silaev
Nice, glad I could help )) -- Dmitri On Mon, May 30, 2011 at 5:03 AM, Erik Reisig wrote: > Hi Dmitri, > > thank you very much for your hint. I retrained the Engine with the > appropriate font and did thresholding with 80% on my target zone. > Additionally I used a custom dictonary with the dr

Re: DAWG in sync?

2011-06-01 Thread Dmitri Silaev
In previous releases of Tesseract you could use the same uinicharset file as long as the character set stays the same. In recent releases, uncharset's file format had been extended and now it contains info for restricting relative size and position of glyphs. However, you can still use the old "sho

Re: Create traineddata from different tif and box files

2011-06-01 Thread Holm Dressler
Hi there, OK, found it out by myself: here are the steps: 1. Create 01.tr with tesseract 01.tif 01 nobatch box.train 2. Create 02.tr with tesseract 02.tif 02 nobatch box.train 3. Create unicharset with: unicharset_extractor 01.box 02.box 4. Just copy it (maybe it is not necessary) cp unicharset 0

Re: Create traineddata from different tif and box files

2011-06-01 Thread zdenko podobny
It it written on training doc[1]: "*…**each .tr filename must match an entry in the font_properties file, or mftraining will abort.*" So you could save your time if you read documentation. Zdenko [1] http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#font_properties_(new_in_3.01) On