Re: Ocr in vietnam?

2011-04-14 Thread Sriranga(78yrsold)
Please contact Quan Nguyen "nguyenq" , developer of viet softwares. http://vietocr.sourceforge.net/developers.html cheers, sriranga(78yrs) On Fri, Apr 15, 2011 at 6:58 AM, Max Cantor wrote: > I see that there seems to be a lot of interest in tesseract in vietnam. I > am close by in singapore an

Ocr in vietnam?

2011-04-14 Thread Max Cantor
I see that there seems to be a lot of interest in tesseract in vietnam. I am close by in singapore and make heavy use of it too. I was wondering, do you have a users group there? Is there a group of engineers sitting around looking for freelance jobs perhaps? In any case, I think we should at

Re: New user's question about recognizing Chinese

2011-04-14 Thread zl2k
It is a trained data. You can try to use it to recognize the document. You need to train the system if the performance is not satisfied. On Apr 14, 3:37 am, Justin Woo wrote: > Dear fellows of group tesseract: >         I'm a beginner using tesseract, and I've noticed that the > tesseract 3.0 are

Re: TessBaseAPI error

2011-04-14 Thread zl2k
I now code the image as raw binary image and the error disappear. However, the output is totally wrong. Strangely, if I directly call "tesseract image.tif result -l eng" from command line, the result is correct. Here is my few lines of code: tesseract::TessBaseAPI api; api.Init(NULL, "eng"); //co

Using tesseract in CUBE mode

2011-04-14 Thread Amrit
Hi All, Pursuing my ongoing work on trying to develop a postal address recognizer , I was excited to discover the implementation of CUBE mode and especially the thought that it might be used to incorporate some language modelling techniques along with tesseract. I believe that one

Re: jTessBoxEditor

2011-04-14 Thread Quan Nguyen
Version 0.2 Release: - Add a provision to set font for the Box Coordinates table - Incorporate a pangram into the Font dialog http://sourceforge.net/projects/vietocr/files/ On Apr 10, 8:01 am, Quan Nguyen wrote: > jTessBoxEditor is a box editor for Tesseract OCR data, providing > editing of box

Re: TessBaseAPI error

2011-04-14 Thread zl2k
I noticed it is in TessBaseAPI::Recognize int TessBaseAPI::Recognize(struct ETEXT_STRUCT* monitor) { //... if (thresholder_ == NULL || thresholder_->IsEmpty()) { tprintf("Please call SetImage before attempting recognition."); return -1; } //... } My input image is coded binary, sing

Re: TessBaseAPI error

2011-04-14 Thread zl2k
I block the Recognize but still get the same error. Other comments? On Apr 14, 5:23 am, Leonardo Gomes wrote: > I'm a newbie with Tesseract, but I know that api.GetUTF8Text() calls > Recognize for you. Try removing the call to Recognize to see what happens. > > Cheers, > Leo. > > > > > > > > On T

Re: Problem with eng.traineddata after 3 or 4 successful runs against different pdf's

2011-04-14 Thread E. Caudex
zdenko podobny wrote: > On Wed, Apr 13, 2011 at 2:31 AM, caudex wrote: > >> After using regedit and pointing tessdata_prefix to the right place >> and running again I got an error that referred to unicharset. The >> entire contents of my tessdata subdirectory is: >> >> Directory of C:\tesseract\

Re: TessBaseAPI error

2011-04-14 Thread Leonardo Gomes
I'm a newbie with Tesseract, but I know that api.GetUTF8Text() calls Recognize for you. Try removing the call to Recognize to see what happens. Cheers, Leo. On Thu, Apr 14, 2011 at 7:58 AM, zl2k wrote: > hi, all > > I have composed a Pix image and here are the 4 lines of code to > recognize the

TessBaseAPI error

2011-04-14 Thread zl2k
hi, all I have composed a Pix image and here are the 4 lines of code to recognize the input image (I am using tesseract 3.00) tesseract::TessBaseAPI api; // construct pix_image api.SetImage(pix_image); api.SetRectangle(0, 0, pixGetWidth(pix_image), pixGetHeight(pix_image)); api.Recognize(NULL); s

New user's question about recognizing Chinese

2011-04-14 Thread Justin Woo
Dear fellows of group tesseract: I'm a beginner using tesseract, and I've noticed that the tesseract 3.0 are much more languages supportable including Chinese. But it seems that for recognizing new language, I must train it myself using Box Files and blabla which is a really tough task for

Re: Problem with eng.traineddata after 3 or 4 successful runs against different pdf's

2011-04-14 Thread zdenko podobny
On Wed, Apr 13, 2011 at 2:31 AM, caudex wrote: > After using regedit and pointing tessdata_prefix to the right place > and running again I got an error that referred to unicharset. The > entire contents of my tessdata subdirectory is: > > Directory of C:\tesseract\Tesseract-OCR\tessdata > > 04/0