Re: Chinese OCR - top-down right-left orientation and training

2012-11-15 Thread Devin Bean
Thanks, I appreciate the suggestions! On Friday, November 2, 2012 1:48:45 PM UTC-4, sventech wrote: > > Cutting off the borders and possibly adding white borders might help. > Normalizing out the text that bleeds through the page would also help. > The text is clear, so you might not need to ret

How to build the tesseract 3.02.02 project in Eclipse at Ubuntu?

2012-11-15 Thread Linda Li
I want to build the tesseract 3.02.02 project so that I can modify some code to tune it to some specific task. Version: tesseract 3.02.02 Ubuntu 12.04, Eclipse Juno I put the tesseract into the Eclipse project. Include directories /usr/local/include /usr/local /usr/include/leptonica and all fil

Problem with ViewerDebugging with tesseract 3.02.02

2012-11-15 Thread Linda Li
Version: tesseract 3.02.02 Ubuntu 12.04, Eclipse Juno I am trying to use ViewerDebugging. Following the instructions in http://code.google.com/p/tesseract-ocr/wiki/ViewerDebugging I installed javac download piccolo-1.2.jar, piccolox-1.2.jar, and make ScrollView.jar Then I use export to set the

Re: inconsistent results from tesseract when the same TessBaseAPI object is used for decoding multiple images

2012-11-15 Thread newtotesseract
Hi Dmitri, How do we clear the adaptive classifier? Can I please know, what is the API or function for clearing the adaptive classifier? Best Regards, - ganesh On Friday, November 16, 2012 3:39:22 AM UTC+8, Dmitri Silaev wrote: > > Sriranga, > > All you can specify in the command line can be s

Re: Word Search Using Tessnet

2012-11-15 Thread Sven Pedersen
There is a newer wrapper for 3.x version: http://code.google.com/p/tesseractdotnet/w/list I think it was made by the developer of VietOCR --Sven On Thu, Nov 15, 2012 at 5:06 PM, zdenko podobny wrote: > On Fri, Nov 9, 2012 at 1:43 PM, Troy Frazier wrote: > >> Is it possible to search an image

Re: Word Search Using Tessnet

2012-11-15 Thread zdenko podobny
On Fri, Nov 9, 2012 at 1:43 PM, Troy Frazier wrote: > Is it possible to search an image for a particular word using the Tessnet > wrapper? I see that it is possible to limit your scan to certain > characters, but what I would like to do is to input a word and have all > instances of that word be

Re: Tesseract Forms Recognition,

2012-11-15 Thread Sven Pedersen
Hi Rey, The Shared Questionnaire System (SQS) is doing something here under Apache license: http://dev.sqs2.net/projects/ in Java, XSLT and JavaScript And queXF assumes you create the forms yourself (under GPLv2) http://quexf.sourceforge.net/ for tesseract's license Check here: http://www.apache.

Re: Can I configure Tesseract to *always* match a dictionary word?

2012-11-15 Thread Zdenko Podobný
Regarding "user_patterns_suffix" have a look at tesseract manual page [1]. I am not sure if there is possibility to force tesseract choose ocr output from dictionary (I never tried it ;-) ) But you can increase dictionary strength with variables language_model_penalty_non_freq_dict_word and la

Re: Having traindata files uncombined

2012-11-15 Thread Zdenko Podobný
Can you please use 3.02 version instead of 3.01 and write exact error message? There is possibility to copy text from windows console - select relevant text/lines with pressed left mouse button then click with right mouse button outside of selected text but in console window - highlight will di

Re: ocr of image fails

2012-11-15 Thread Sven Pedersen
Yes, I think the text size (x-height) was too small. Also, the English language data may be trained with more fonts, given that Google created it. --Sven On Thu, Nov 15, 2012 at 6:43 AM, sascha4j wrote: > after converting the image with imagmagick the result is better. not 100% > but nearly. >

Re: ocr of image fails

2012-11-15 Thread sascha4j
after converting the image with imagmagick the result is better. not 100% but nearly. the options for imagemagick were convert -colorspace gray -resize 200% -unsharp 0x8+1.5+0.05 Am Donnerstag, 15. November 2012 10:26:21 UTC+1 schrieb sascha4j: > Hi, > > i try to ocr some scanned text w

ocr of image fails

2012-11-15 Thread sascha4j
Hi, i try to ocr some scanned text with tesseract-ocr. for some images the result is quite good. but for this one ( see attached file) the result is poor. any hints why ? and what i could do to get a better result? i use tesseract 3.0.2 with german language. greetings sascha4j

Re: Confidence in HOCR file

2012-11-15 Thread José Luis Rey
Thanks very much for your responses zdenop, I'm not used to dev in open source projects like this, perhaps you may help me to understand, for example if I implement a feature to add character rect&confidence to the hocr output, how this is translated to the main project (if it is good enough

Re: Confidence in HOCR file

2012-11-15 Thread zdenko podobny
On Thu, Nov 15, 2012 at 10:15 AM, José Luis Rey wrote: > Thanks very much for your responses zdenop, > > I'm not used to dev in open source projects like this, perhaps you may > help me to understand, for example if I implement a feature to add > character rect&confidence to the hocr output, how

inconsistent results from tesseract when the same TessBaseAPI object is used for decoding multiple images

2012-11-15 Thread newtotesseract
Hi friends I am using a static TessBaseAPI object in my application. This object gets initialized and reads, processes the training data at the startup of the application. Then, this application processes multiple scanned images through the TESS_API TessBaseAPI::ProcessPages() function, using