Re: tesseract testing suite

2013-05-08 Thread Shree Devi Kumar
t works fine. Thanks, I can now check the accuracy of my output. Shree > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group,

Re: tesseract testing suite

2013-05-08 Thread Shree Devi Kumar
in a directory and compare them to the groundtruth files without running tesseract again? In Dos, I can use something like.. for /f "delims=|" %%F in ('dir san.input.*.txt /b') do ( ) How would I do something similar in BASH? Thanks, Sh

Re: jTessBoxEditor 0.6 Beta release

2013-05-12 Thread Shree Devi Kumar
Are you training Odia language? Have you seen http://tdil-dc.in/tdildcMain/articles/374232Odia%20Script%20Grammar_Ver1.0.pdf ? Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, May 11, 2013 at 9:01 PM

Re: [Announcement] QT Box Editor 1.10

2013-05-20 Thread Shree Devi Kumar
d all the words are maximally chopped. There is also a config called rebox. i was just looking to see if the export of one line at a time was related to a new format boxfile in anyway. I am trying to create some data using scanned images and it is faster to edit it at a line at a time rather than o

Re: jTessBoxEditor 0.6 Beta release

2013-05-21 Thread Shree Devi Kumar
Mamata, Please see https://code.google.com/p/tesseract-ocr/downloads/list for the available language data friles for tesseract 3.02. In case Odia is similar to bangala, you can use the bengali traineddata to bootstrap for odia. Shree Shree Devi Kumar

Training Oriya Language

2013-06-01 Thread Shree Devi Kumar
training text in oriya. I would suggest that you first create the oriya traineddata using the lohit files and then add your files one by one. Shree​ On Sat, Jun 1, 2013 at 11:35 AM, mamata nayak wrote: > Sir, > please help me > Actually character set of my language consists of about

Re: tesseract testing suite

2013-06-03 Thread Shree Devi Kumar
Thanks, Nick, unix is indeed cool, when one knows how :-) Thanks so much for the commands. Appreciate the help. Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 3, 2013 at 4:34 PM, Nick White

Re: Cube documentation, training source files, and openness

2013-06-03 Thread Shree Devi Kumar
Great idea! I would suggest putting the documentation in a wiki instead of here. That way it will be easier to refer to and find later. Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 3, 2013

Re: preserving spaces

2013-06-12 Thread Shree Devi Kumar
not get recognized correctly. Maybe there are some config variables that I need to tweak to fix this. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 10, 2013 at 6:07 PM, Nick White wrote: > Hi E

Re: Cube documentation, training source files, and openness

2013-06-14 Thread Shree Devi Kumar
there is minimal nn code in 3.02. Please see: http://www.cedricve.me/2013/04/12/how-to-train-tesseract/ Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 10, 2013 at 10:28 PM, Nick White wrote

Re: Error (cnTraining.exe has encountered a problem and needs to close)

2013-06-29 Thread Shree Devi Kumar
Hi Alan, For a GUI interface for tesseract-ocr, you can try VietOCR - http://vietocr.sourceforge.net/ I have found it useful. Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jun 28, 2013 at 8:41 PM

Re: Word Confidence in hOCR Not Working

2013-07-01 Thread Shree Devi Kumar
http://tesseract-ocr.googlecode.com/svn-history/r831/trunk/vs2008/doc/setup.html#using-the-latest-tesseractocr-sources Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Jun 30, 2013 at 11:01 PM, Perry

Re: Japanese detection parameter

2013-07-02 Thread Shree Devi Kumar
Hello Zdenko and Nick, Could one of you add this info to the wiki documentation, please. It will be helpful for other users. Thanks, Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul 2, 2013 at

Re: How to include release number when building tesseract project with Visual Studio 2008?

2013-07-02 Thread Shree Devi Kumar
-ocr.googlecode.com/svn-history/r831/trunk/vs2008/doc/maintenance.html Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul 2, 2013 at 12:21 PM, sdk wrote: > Hello All, > > I built tesseract proj

Re: tesseract training flags to rtl languages

2013-07-07 Thread Shree Devi Kumar
Also see: https://code.google.com/p/tesseract-ocr/issues/detail?id=811 https://github.com/reza1615/PersianOcr/blob/master/Convertor%20unicharset%20to%20RTL.py Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On

Re: How to trian tesseract for new fonts?

2013-07-11 Thread Shree Devi Kumar
i/Sanskrit. Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Jul 11, 2013 at 7:20 PM, matthew christy wrote: > If you do find a font with whatthefont, then use the directions here: > https:/

Re: How to trian tesseract for new fonts?

2013-07-12 Thread Shree Devi Kumar
Thanks, Matthew. I have registered for Prima Tools. However, since I am not affiliated to any institution, I am not sure whether they will approve registration. I haven't heard back yet. I'll wait to see if I can use Franken+ with my existing training files. Thanks, Shree Shree

Re: The way the path to tessdata directory is defined.

2013-07-14 Thread Shree Devi Kumar
le is used? Thanks, Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 15, 2013 at 2:26 AM, zdenko podobny wrote: > I play a little bit with Dmitry Katsubo patch. Based on it I suggest to >

Re: The way the path to tessdata directory is defined.

2013-07-15 Thread Shree Devi Kumar
Thanks, Nick! I had been using the full paths. Your response validates that approach. Regards, Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 15, 2013 at 4:46 PM, Nick White wrote: >

Re: Training for Burmese (New Language)

2013-07-15 Thread Shree Devi Kumar
-ocr-part-2-training-characters.html Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 15, 2013 at 4:18 PM, Sithu Thwin wrote: > I want to train tesseract for Burmese. But I don't know how to d

Re: Where can I set tessedit_ocr_engine_mode for tesseract-ocr?

2013-07-15 Thread Shree Devi Kumar
see http://tesseract-ocr.googlecode.com/svn/trunk/doc/combine_tessdata.1.html Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 15, 2013 at 9:12 PM, bear wrote: > Thanks, Nick. After pok

Re: The way the path to tessdata directory is defined.

2013-07-16 Thread Shree Devi Kumar
T hank you, Zdenko and Nick, for the clarifications. Shree -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tes

Re: FAIL! APPLY_BOXES errors in evidently perfect training samples

2013-07-17 Thread Shree Devi Kumar
Please post it as an issue in http://code.google.com/p/tesseract-ocr/issues/list I am having similar problems too. Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Jun 30, 2013 at 3:21 AM, TedJ

Re: FAIL! APPLY_BOXES errors in evidently perfect training samples

2013-07-18 Thread Shree Devi Kumar
chance, please take a look at the box/tif pair and let me know what can be done to fix them. When I make box files using tesseract or qt-box-editor, many words are chopped incorrectly and when I make those corrections, I get the apply_boxes error. Looking for some way around this. Thanks, Shree On

Re: Reference to bangla Box file

2013-08-04 Thread Shree Devi Kumar
I would suggest that you use the latest version of Tesseract_ocr 3.02 and then try this. Shree On Sun, Aug 4, 2013 at 12:23 PM, mamata nayak wrote: > Sir, > I have found the box files for bangla language for tesseract version 2. > > It uses two box entries for one character in

Re: Need help reg pre-processing of image before ocr

2013-08-23 Thread Shree Devi Kumar
Thanks, Sven. Yes, that's the kind of improvement I am looking for. I have read that imagemagick is helpful in fixing the images. I'll give it a try. I was hoping that someone in the group would mention the settings they used to fix similar grainy images . Shree Shree

Re: Need Help brigtness/contrast/resolution

2013-08-25 Thread Shree Devi Kumar
think). Hope this helps. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Aug 25, 2013 at 6:54 PM, Baramon wrote: > After 3 hours of googling i realized Tesseract can't do this without help > of I

Re: Need Help brigtness/contrast/resolution

2013-08-25 Thread Shree Devi Kumar
Have you tried vietocr - http://vietocr.sourceforge.net/ to see if that gives you better results. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Aug 25, 2013 at 7:05 PM, Baramon wrote: > Yeah i found

Re: Training new language

2013-08-25 Thread Shree Devi Kumar
/ http://ayoungprogrammer.blogspot.in/2013/01/equation-ocr-part-2-training-characters.html http://www.resolveradiologic.com/blog/2013/01/15/training-tesseract/ Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon

Re: Need help reg pre-processing of image before ocr

2013-08-25 Thread Shree Devi Kumar
training data in order to get the character shapes to match the typeface of book and will share that traineddata. Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Aug 24, 2013 at 9:09 AM, Sriranga(79yrs

Re: Tesseract Training

2013-08-28 Thread Shree Devi Kumar
ed well by any of the existing traineddata. However, please note that currently each training starts from scratch (does not build upon existing data) . The next release with updates from Google may have a different procedure. Shree Devi Kumar __

Re: new language, normal font

2013-08-28 Thread Shree Devi Kumar
training images and box files. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Aug 23, 2013 at 10:54 PM, Davide Pioggia wrote: > Hi All, > I'm doing OCR on documents written in normal Times Roma

Re: OCR romanized Asian languages

2013-08-28 Thread Shree Devi Kumar
Please post a sample image. Have you tried with Vietanamese language data or vietocr? http://vietocr.sourceforge.net/ Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Aug 28, 2013 at 1:49 PM, JOSE MARIA

Re: OCR char restriction

2013-08-29 Thread Shree Devi Kumar
For details regarding bazaar pattern, see section regarding config files in http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html Now, if you pass the word *bazaar* as a trailing command line parameter to > Tesseract, Tesseract will not bother loading the system dictionary nor the >

Re: Digits recognition problem need advice

2013-08-31 Thread Shree Devi Kumar
Have you tried gimp in batch mode ... http://www.gimp.org/tutorials/Basic_Batch/ http://gimpfr.org/contrib_photolabo.php http://registry.gimp.org/node/23499 Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat

Re: Traineddata for Latin-Indic

2013-09-02 Thread Shree Devi Kumar
ic number 20617 (0x5089). processing san.mnt.exp424.png TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). Press any key to continue . . . Should I open issues for the above? Shree Devi Kumar भजन - कीर्तन - आर

Re: Tesseract for windows x86,x64 and windows phone 8/WINRT

2013-10-15 Thread Shree Devi Kumar
dropdown. What do I need to do to get it to recognize Hindi text? Thanks, Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Oct 15, 2013 at 2:43 PM, Rui Barbosa wrote: > Hi to all, > My name

Re: Tesseract for windows x86,x64 and windows phone 8/WINRT

2013-10-15 Thread Shree Devi Kumar
Thanks, It works now. I had to put the traineddata files under bin/Relaese/Tessdata though. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Oct 15, 2013 at 10:12 PM, Rui Barbosa wrote: > Hi Sh

Re: error during installaton of tesseract-3.01 with leptonica-1.69 in ubuntu 13.04

2013-10-17 Thread Shree Devi Kumar
ps.google.com/d/msg/tesseract-dev/Z1lTKePp-hY/kUtGy4gdNS4J>. > > ​ > Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Oct 17, 2013 at 12:14 PM, zdenko podobny wrote: > On Thu, Oct 17, 2013 at 8:06 AM,

Re: shapeclustering

2013-10-17 Thread Shree Devi Kumar
mv inttemp odia.inttemp mv pffmtable odia.pffmtable mv shapetable odia.shapetable combine_tessdata .\odia. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Oct 17, 2013 at 12:42 PM, mama wrote: > I

Re: pre propcessing recommendation required

2013-10-18 Thread Shree Devi Kumar
though the whole page is in the same font and size. Not sure whether this si the cause in your case though. I am hoping that 3.03 will take care of some of these issues, still waiting for the source to compile on windows though . Shree

Re: FAILURE! Couldn't find a matching blob issues, Need Advice

2013-10-28 Thread Shree Devi Kumar
I think your training data should be more than one line. Create a page of text and see if that works. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Oct 28, 2013 at 7:59 AM, Jonathan Nikkel wrote: >

Re: Is it correct order of words in Arabic?

2013-11-10 Thread Shree Devi Kumar
Please see http://code.google.com/p/tesseract-ocr/issues/detail?id=899&can=1&q=arabic https://github.com/reza1615/PersianOcr The suggestions for persian would apply to arabic and urdu also. Shree Devi Kumar भजन - कीर्त

Re: where to download the latest boxtiff for eng.traineddata(3.02)?

2013-11-22 Thread Shree Devi Kumar
use your traineddata in addition to the official file. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Nov 22, 2013 at 1:23 PM, Сергей Якушевич wrote: > > I was thinking that it is possible deco

Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

2013-11-26 Thread Shree Devi Kumar
For GUI you can try VietOCR - http://sourceforge.net/projects/vietocr/files/vietocr/ For Language data for sanskrit transliteration Try http://sourceforge.net/projects/tesseracthindi/files/Tesseract-3-02-SanskritTransliteration/ Shree Devi Kumar

Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

2013-11-27 Thread Shree Devi Kumar
/ü/ū/g Also attaching sed script as a utf-8 text file. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Nov 27, 2013 at 3:45 PM, V S Rawat wrote: > those Ā á character are defined in Garamond font, but

Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

2013-11-27 Thread Shree Devi Kumar
sed -f roman.sed inputfile.txt > outputfile.txt You will have to add other substitutions to the file roman.sed - it only has the first few substitutions that I encountered. Shree Devi Kumar भजन - कीर्तन - आरती @ h

Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

2013-11-27 Thread Shree Devi Kumar
result, so to use the OCR output and postprocess in this case may not be the best solution. You could try windows version of sed from http://gnuwin32.sourceforge.net/packages/sed.htm i only tested using one para of text from page 11. Shree Shree Devi Kumar

Re: Need help in recognizing english texts with sanskrit roman diacritical marks.

2013-11-28 Thread Shree Devi Kumar
You may want to look at a software called SANSKRITOCR. The old version was free. There is a new commercial version also. Please see http://www.sanskritreader.de/ Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On

Re: Howto train documents with unknown font?

2013-12-06 Thread Shree Devi Kumar
You can use a 'fake-font' name for the unknown fonts - give different names for different typefaces. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Dec 6, 2013 at 8:33 PM, Ingo W. wrote: >

Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

2013-12-06 Thread Shree Devi Kumar
Matthew, I had tried registering for Aletheia a few months ago. No response so far. Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Dec 7, 2013 at 2:57 AM, matthew christy wrote: > Hi Jan

Re: combine tessdata question

2013-12-10 Thread Shree Devi Kumar
You can create a new traineddata file with multiple 'fake-fonts' and then use in addition to the existing traineddata. eg. -l deu+newtraineddata so you don't have to have separate traineddata for each font, though you'll have separate .tr files - one for each of your 

Re: makebox + languagefile

2013-12-10 Thread Shree Devi Kumar
tesseract newlang.fakefont.exp0.tif newlang.fakefont.exp0 -l deu batch.nochop makebox Why don't you try the JTessBoxEditor GUI for it? Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Dec 10, 2013 at

Re: Problems using Tesseract with Hindi

2014-01-08 Thread Shree Devi Kumar
You can try it using vietocr. Look at the attachment with instructions at https://groups.google.com/forum/#!topic/sanskrit-programmers/jyvRGnMWXiQ Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jan 7

Re: Problems using Tesseract with Hindi

2014-01-20 Thread Shree Devi Kumar
http://sourceforge.net/projects/vietocr/ Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jan 20, 2014 at 4:13 AM, Ravi Roshan wrote: > Thank you sir, > > Whatever you instruct me its working, b

Re: [tesseract-ocr] Re: Font Limit = 64 fonts in traineddata, really ??

2014-07-08 Thread Shree Devi Kumar
forthcoming in future. Ray/Zdenko/Nick may be able to give an idea of expected timeline for release. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul 8, 2014 at 5:04 PM, Paul wrote: > If you have a look

Re: [tesseract-ocr] Re: Regarding Tesseract OCR engine for recognizing Tamil Fonts

2014-07-20 Thread Shree Devi Kumar
tools and traineddata for other languages maybe forthcoming during next few months, but no one knows when... Shree On Sun, Jul 20, 2014 at 10:07 PM, sibi kanagaraj wrote: > Hi , > > Sorry for my delayed reply . > > Thank you Paul and Nick for your Inputs . > > @ Paul , &g

Re: [tesseract-ocr] Need help for Hindi output

2014-07-28 Thread Shree Devi Kumar
You can try using vietocr for it. Please see http://sourceforge.net/projects/tesseracthindi/files/OCRHindi_using_VietOCR_and_Tesseract.pdf/ for instructions for the same (about one year old). Shree Devi Kumar भजन - कीर्तन - आरती

Re: [tesseract-ocr] Building training tools from source

2014-08-01 Thread Shree Devi Kumar
patch -p1 > debuild -us -uc > cd .. > sudo dpkg -i *.deb Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Aug 1, 2014 at 5:47 PM, Peter Hamberg wrote: > I knew it had to be something obvious. I

Re: [tesseract-ocr] Re: Missing detailed documentation about Unicharset files

2014-08-06 Thread Shree Devi Kumar
s can be compiled on Windows or do I need to to get access to Linux somewhere to give them a try. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Aug 6, 2014 at 8:23 PM, Nick White wrote: > Hi Albre

Re: [tesseract-ocr] Re: Regarding Tesseract OCR engine for recognizing Tamil Fonts

2014-08-07 Thread Shree Devi Kumar
TAMIL LETTER LLA and the last part of பௌ 0BCC TAMIL VOWEL SIGN AU (combined with pa (ப)) The files include tam.traineddata which can be used with VIETOCR to test OCR of tamil texts. Shree Devi Kumar भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Error when running "make" - scanutils.cpp:38:14: error: typedef redefinition with different types ('long' vs '__darwin_off_t' (aka 'long long'))

2014-08-12 Thread Shree Devi Kumar
Thanks, Cory. Nick, it maybe helpful to add/update instructions in wiki. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 12, 2014 at 4:31 AM, testing1234 wrote: > Note.. Step 5 above the l

Re: [tesseract-ocr] Where can I find the tessdata and training/langdata?

2014-08-15 Thread Shree Devi Kumar
/ tessdata Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Aug 15, 2014 at 1:38 PM, SHEN Fei wrote: > If I'm right, some of tessdata/ is part of tesseract repo, see > https://github.com/tesseract-oc

Re: [tesseract-ocr] Re: list_available_fonts.

2014-08-19 Thread Shree Devi Kumar
Please see http://www.cyberciti.biz/tips/quickly-list-all-available-fonts.html Maybe that will provide the info u need. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 19, 2014 at 1:18 PM, zdenko

Re: [tesseract-ocr] Re: list_available_fonts.

2014-08-19 Thread Shree Devi Kumar
or ... text2image --list_available_fonts --fonts_dir= /usr/share/fonts/ Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 19, 2014 at 1:51 PM, Shree Devi Kumar wrote: > Please see >

[tesseract-ocr] compiling leptonica 1.71 under mingw on windows8

2014-08-20 Thread Shree Devi Kumar
ursive make[1]: Entering directory `/home/User/leptonica-1.71' Making all in src make[2]: Entering directory `/home/User/leptonica-1.71/src' CC adaptmap.lo CC affine.lo CC affinecompose.lo CC arrayaccess.lo and it seems to be hanging at this stage. How long does

Re: [tesseract-ocr] Makefile:372: recipe for target 'all' failed - using current version with leptonica 1.71 on cygwin

2014-08-21 Thread Shree Devi Kumar
zdenko, yes, but the file is there and I was able to run by giving sh autogen.sh Please see the messages below. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Aug 21, 2014 at 12:28 PM, zdenko podobny

Re: [tesseract-ocr] Makefile:372: recipe for target 'all' failed - using current version with leptonica 1.71 on cygwin

2014-08-21 Thread Shree Devi Kumar
Hi Zdenko, ./ confusing for me :-) I tried yesterday with mingw and msys but process was hanging while compiling leptonica, so tried today with cygwin. Here is the version info under cygwin gcc version 4.8.3 (GCC) automake 1.14 autoheader 2.69 autoconf 2.69 Shree Devi Kumar

Re: [tesseract-ocr] 3.03 compilation problems on FreeBSD.

2014-08-21 Thread Shree Devi Kumar
On windows the following as given in http://vorba.ch/2014/tesseract-cygwin.html worked for me. I am using leptonica 1.71 ./configure LDFLAGS=-L/usr/local/lib Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On

[tesseract-ocr] Re: compiling leptonica 1.71 under mingw on windows8

2014-08-22 Thread Shree Devi Kumar
n MSYS bug:* > * http://sourceforge.net/p/mingw/bugs/1950/ > <http://sourceforge.net/p/mingw/bugs/1950/>* > * <http://sourceforge.net/p/mingw/bugs/1950/>* > *The workaround is to use "make -j1" or to downgrade to MSYS 1.0.17 until > **MSYS > 1.0.19 is rel

[tesseract-ocr] tesseract 3.04 can be downloaded as a package for msys2 (will work on windows)

2014-08-26 Thread Shree Devi Kumar
Follow instructions on https://sourceforge.net/p/msys2/wiki/MSYS2%20installation/ to setup msys2 - 43 minutes ago [image: Alexx83]Alexx83 posted a comment on ticket #71

[tesseract-ocr] Re: tesseract 3.04 can be downloaded as a package for msys2 (will work on windows)

2014-08-26 Thread Shree Devi Kumar
Please note that this does NOT install any language data. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 26, 2014 at 1:05 PM, Shree Devi Kumar wrote: > Follow instructions on > &

[tesseract-ocr] Re: [tesseract-dev] Re: tesseract 3.04 can be downloaded as a package for msys2 (will work on windows)

2014-08-27 Thread Shree Devi Kumar
remove public tesseract > repository. > > Zdenko > > > On Wed, Aug 27, 2014 at 3:46 AM, shree wrote: > >> Zdenko, >> >> Sorry it was not meant to be a 'release' of 3.04, I just wanted to get >> the latest code compiled under msys2 and asked the develo

Re: [tesseract-ocr] Re: Regarding Tesseract OCR engine for recognizing Tamil Fonts

2014-08-27 Thread Shree Devi Kumar
langdata&r=9204c02c18daedaedc8aeaab1c1dd99e544cc932 All training related files for tamil are at https://code.google.com/p/tesseract-ocr/source/browse/tam/?repo=langdata&r=9204c02c18daedaedc8aeaab1c1dd99e544cc932 Hope this helps you. Shree Shree Devi Kumar __

Re: [tesseract-ocr] Re: I got some error during training regular(?) box-tiff data. [tesseract 2.04 version]

2014-09-02 Thread Shree Devi Kumar
As per the project page https://code.google.com/p/tesseract-ocr/ Version 3.02 ships with Ubuntu 12.04 English traineddata for that is available at https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02.eng.tar.gz&can=2&q= Shree De

Re: [tesseract-ocr] Training tesseract 3.03 in a custom C and C++ code using C-API

2014-09-02 Thread Shree Devi Kumar
have u looked at jtessboxeditor which has builtin the whole training function. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Sep 2, 2014 at 5:38 PM, Dovhani Foneworx wrote: > Good day, I h

Re: [tesseract-ocr] Re: I got some error during training regular(?) box-tiff data. [tesseract 2.04 version]

2014-09-02 Thread Shree Devi Kumar
Ok. I am not familiar with tesseract 2. Maybe some of the other members can help. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Sep 2, 2014 at 3:03 PM, Choi wrote: > Thanks for your answer. > B

Re: [tesseract-ocr] Training tesseract 3.03 in a custom C and C++ code using C-API

2014-09-02 Thread Shree Devi Kumar
Quan has provided source for his program. Please see http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/ You can ask him for details. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Sep 3, 2014

Re: [tesseract-ocr] Detect only AlphaNumberic characters

2014-09-03 Thread Shree Devi Kumar
​did you unpack the eng.traineddata first to get all the files?​ Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Sep 3, 2014 at 9:20 PM, John Nilson wrote: > > Any help would be greatly appre

Re: [tesseract-ocr] Detect only AlphaNumberic characters

2014-09-04 Thread Shree Devi Kumar
http://tesseract-ocr.googlecode.com/svn/trunk/doc/combine_tessdata.1.html Combine_tessdata -u to unpack and get all files from the traineddata file - that will have in it the unicharset also. I am not familiar with the cube files that you are changing, so can't comment about that. Shree

Re: [tesseract-ocr] Detect only AlphaNumberic characters

2014-09-04 Thread Shree Devi Kumar
You may also be able to do this by giving a config file as parameter at runtime. I haven't tried with 'whitelist' though. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Sep 4, 2014 at 9:14

Re: [tesseract-ocr] Re: Post-correction of OCR-generated text

2014-09-05 Thread Shree Devi Kumar
Interesting paper re TICCL - wondering whether tesseract is using similar approach for 3.04 language data with the unigram and bigram lists along with 'clean' word lists ... see section 4.4 processing steps Shree Devi Kumar __

Re: [tesseract-ocr] Does tesseract 3.03 return 3.02 with -version ?

2014-09-07 Thread Shree Devi Kumar
What source file did you use for compiling tesseract? Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Sep 7, 2014 at 11:06 PM, wrote: > I recently manages to compile tesseract 3.03 with lots of probl

Re: [tesseract-ocr] Does tesseract 3.03 return 3.02 with -version ?

2014-09-08 Thread Shree Devi Kumar
NOT released, only for developers, testers, etc ) git clone https://code.google.com/p/tesseract-ocr/ Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Sep 8, 2014 at 2:56 PM, Nicolas Nickisch wrote: >

[tesseract-ocr] compile error under ubuntu 14.04

2014-09-09 Thread Shree Devi Kumar
data ... ---- Any suggestions on how to fix this? Thanks, Shree -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-o

Re: [tesseract-ocr] Re: compile error under ubuntu 14.04

2014-09-09 Thread Shree Devi Kumar
tesseract 3.03 leptonica-1.71 libjpeg 8d : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 ​I will ask him to check regarding this. Shree​ On Wed, Sep 10, 2014 at 6:56 AM, Jeff Breidenbach wrote: > This error comes from Leptonica 1.70. Tesseract now requires Leptonica > 1.71. > Leptonica 1.

Re: [tesseract-ocr] Re: read_params_file

2014-09-15 Thread Shree Devi Kumar
Please look at tesstrain.sh in training directory. You can process multiple files in a loop. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Sep 15, 2014 at 12:01 PM, Dovhani Foneworx wrote: > Tha

Re: [tesseract-ocr] Re: [Clarification request] Is it possible to let Tesseract generate three output files i) text ii) hOCR iii) PDF in a *single* run ?

2014-09-16 Thread Shree Devi Kumar
Quan, Can it also be done in commandline version? Shree Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Sep 17, 2014 at 7:03 AM, Quan Nguyen wrote: > You can use the new ResultRenderer API in v3.03

Re: [tesseract-ocr] Re: Improve recognize russian chars

2014-09-19 Thread Shree Devi Kumar
image. The output from your image is also attached. I am using the compiled version of the latest source from git on windows8 under msys2. Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Sep 19, 2014 at 9:27

[tesseract-ocr] Re: Need help reg pre-processing of image before ocr

2014-09-19 Thread Shree Devi Kumar
Do you still need a copy of sanskrit traineddata ? Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Aug 23, 2013 at 10:21 PM, mns_rao wrote: > Hi, > The result output of OCR also depends on train

[tesseract-ocr] Re: [tesseract-dev] Re: tesseract 3.04 can be downloaded as a package for msys2 (will work on windows)

2014-09-21 Thread Shree Devi Kumar
bbreviated hash tag of the current revision Shree Devi Kumar भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Sep 20, 2014 at 4:20 AM, zdenko podobny wrote: > I tagged master branch in repository (AFAIK initial code co

Re: [tesseract-ocr] Re: Tesseract OCR for Recognize text with mathematical equation / operator

2014-10-01 Thread Shree Devi Kumar
Have you tried https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02.equ.tar.gz&can=2&q= Math / equation detection module for Tesseract 3.02 try to use two laguages traineddata l=eng+equ and see if you get better results Shree De

Re: [tesseract-ocr] Tesseract from git and pdf output

2014-10-02 Thread Shree Devi Kumar
Usually that error comes if pdf.ttf and pdf.ttx are not in your tessdata directory. Please check that files from https://code.google.com/p/tesseract-ocr/source/browse/#git%2Ftessdata are there in your tessdata directory pointed by the tessdata_prefix. Shree Devi Kumar

[tesseract-ocr] warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

2014-10-05 Thread Shree Devi Kumar
I get these warnings while compiling tesseract - any suggestions on what to change to fix this ... User@HP ~/source $ ./buildtess.sh Fetching origin remote: Counting objects: 4, done. Unpacking objects: 100% (4/4), done. >From https://code.google.com/p/tesseract-ocr c0640a4..4c01561 master

<    5   6   7   8   9   10