Tesseract 3 and paragraph separation

2012-03-22 Thread Demian Katz
Hello, I'm using Tesseract 3 as a simple command-line tool to generate OCR. It's doing a fairly good job, but I have one unmet need -- I need to be able to separate paragraphs with blank lines. It would be great if Tesseract could do this for me, but I'd even be happy if it could include indentat

Re: tesseract under windows and paths

2012-03-22 Thread Zdenko Podobný
Hi Simon, I implemented "--disable-tessdata-prefix" for configure in revision 708. Than means if you build tesseract with this option, TESSDATA_PREFIX is not set during build process to installation directory (usually /usr/share or /use/local/share on linux). I tested it in mingw+msys on Windows

Re: New training only recognize if >3 chars

2012-03-22 Thread Jose Garcia
Ok. I'm OCRing an image I've created. The training it's mine with a special digits font. The ocr works ok if the image has more than 3 digits. In the last test I just did I see the problem only occurs when I use tesseractdotnet. Using tesseract.exe (with the same image i the same training) works

Re: New training only recognize if >3 chars

2012-03-22 Thread zdenko podobny
On Wed, Mar 21, 2012 at 7:22 PM, Jose Garcia wrote: > Hello, > > I've trained tesseract with only this characters: 0123456789-. > > I used one tiff with this characters, with 6 samples of each. > > After the successfully training, tesseract only recognize if in the > input tiff there are more tha