Re: Include Tesseract in C++ code

2012-04-29 Thread zdenko podobny
On Sun, Apr 29, 2012 at 9:51 AM, Pavel Mazniker wrote: > Hi, > > *Just compile tesseract-3.02 in mingw+msys environment and it will create > everything you need.* > Trying to build. > > I checked r724 - is that O.K. version 3.02? > > when running ./configure on the r724 root folder from mingw-ms

Re: tesseract under windows and paths

2012-04-29 Thread zdenko podobny
On Sun, Apr 29, 2012 at 4:10 PM, Pavel Mazniker wrote: > > Hi, >> >> I checked out full r724 from repository, - is it tesseract 3.02 version >> ( latest ) ? >> >> I get when running configure in mingw+msys system terminal: next >> >> "checking for pixCreate in -llept... no >> configure: lept

Re: Cube language model params for Thai Language

2012-05-01 Thread zdenko podobny
On Tue, May 1, 2012 at 9:46 AM, Pavel Mazniker wrote: > Hi, > > Where can I get > > cube language model params tha.cube.lm > > > for Thai language ? > > > I never heard about them. > Because when I put > tess.Init("tessdata",thai,tesseract::OEM_TESSERACT_CUBE_COMBINED) I got : > > > Cube ERROR

Re: Building tesseract 3.02 on Windows unsing mingw+msys

2012-05-01 Thread zdenko podobny
On Tue, May 1, 2012 at 11:56 AM, Pavel Mazniker wrote: > O.K. > > thanks I will check later again what is the problem at my place with the > building. > > Just wanted to know , is it possible to get the .dlls for leptonica and > tesseract *already built* on Windows ( x86/x64 ) using mingw ? > > T

Re: Not a very important question

2012-05-08 Thread zdenko podobny
On Tue, May 8, 2012 at 6:02 PM, Denis Lee wrote: > Greetings. First of all I want to say "great job guys!". And I have a > question: how can I set an exact location on image that should be > scanned for text? > Thanks and sorry for my lame english > > > SetRectangle[1]? [1] http://code.google.co

Re: shapeclustering crashes

2012-05-13 Thread zdenko podobny
On Sat, May 12, 2012 at 11:33 PM, Falke wrote: > Anyone? > > A little extra findings: I'm getting the same crash on three > different installations of Ubuntu 10.10 (Maverick) > > Does shapeclustering work for everyone out there? I'm the only one > out? > > if so, does the backtrace output indi

Re: What do you think about my program SunnyPage v1.0

2012-05-17 Thread zdenko podobny
can somebody stop this spammer, please? -- Zdenko On Thu, May 17, 2012 at 8:32 AM, Levan Gelashvili wrote: > What do you think about my program SunnyPage v1.0. It uses the > tesseract 3.02 alpha. > You can use it for Training Tesseract. > For Training is free! > > www.sunnypage.ge > > -- > You

Re: What are the real requirements for training?

2012-05-17 Thread zdenko podobny
First of all I suggest to use 3.02 version (even it is not released officially). IMHO there is only one additional step comparing to 3.01 training (should be run after step "Compute the Character Set"[1] and before "Clustering" : shapeclustering -F font_properties -U unicharset lang.fontname.exp1.

Re: List of Config Paramenters

2012-05-21 Thread zdenko podobny
IMO the best way is to search for '_MEMBER' (or ' _VAR_H ' in "*.h" ???) as suggested (indirectly :-) ) in Visual Studio 2008 Developer Notes for Tesseract-OCR -> Handy free tools [1]. [1] http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/tools.html#id2 -- Zdenko On Mon, May 21, 2012 at

Re: Tess3.01 hocr output not working with pdfbeads

2012-05-22 Thread zdenko podobny
On Tue, May 22, 2012 at 12:14 AM, Galt wrote: > I should begin by saying that I am grateful and happy to have > a very nice searchable pdf of an old book thanks to Tess. > > I found this on the web: > > > > > https://github.com/steelThread/mimeograph/commit/b29af3338e8f15b22392b4e313c8688d9950e1

Re: Tess3.01 hocr output not working with pdfbeads

2012-05-22 Thread zdenko podobny
On Tue, May 22, 2012 at 2:03 PM, Galt wrote: > > > > > > Please create issue with description what is output and how it should > be... > > > Until then I have forced to make a little hack to pdfbeads to get it > > > to read the position > > > and word from ocr_word and ocrx_word respectively so t

Re: Tess3.01 hocr output not working with pdfbeads

2012-05-22 Thread zdenko podobny
On Tue, May 22, 2012 at 10:12 PM, zdenko podobny wrote: > > > On Tue, May 22, 2012 at 2:03 PM, Galt wrote: > >> >> > >> > > Please create issue with description what is output and how it should >> be... >> > > Until then I have forced t

Re: Tesseract in Subtitle Edit

2012-05-22 Thread zdenko podobny
On Tue, May 22, 2012 at 8:53 PM, Hallur Guðjónsson wrote: > I don't know if any one you guys have used Subtitle Edit for Windows, > but it uses tesseract to OCR the subpictures ripped from dvds. And > after I added the Icelandic language pack (which I extracted from a > debian file) it crashes. Is

Re: Tesseract in Subtitle Edit

2012-05-23 Thread zdenko podobny
s version > of Icelandic, maybe it simply doesn't exist. But is there a way to > convert this to a windows compatible version? > > On May 23, 6:27 am, zdenko podobny wrote: > > On Tue, May 22, 2012 at 8:53 PM, Hallur Guðjónsson >wrote: > > > > > I don'

Re: Tesseract in Subtitle Edit

2012-05-23 Thread zdenko podobny
ld >>> differ on different platforms. I still haven't found a windows version >>> of Icelandic, maybe it simply doesn't exist. But is there a way to >>> convert this to a windows compatible version? >>> >>> On May 23, 6:27 am, zdenko podobny wr

Re: Tesseract in Subtitle Edit

2012-05-23 Thread zdenko podobny
didn't find an Icelandic pack. So I googled an > >>> Icelandic language pack and found the debian one and I thought it > >>> would be the same type of document and only the program itself would > >>> differ on different platforms. I still haven't found a

Re: Latin (Roman antiquity!) alphabet training

2012-05-23 Thread zdenko podobny
On Wed, May 23, 2012 at 11:10 PM, Falke wrote: > From what I see, there is no traineddata for the Roman latin > alphabet. Essentially, the current eng.traineddata's shortcoming is > its lack of the macron diacritic. > > Is it possible to add the macron glyphs to the already-existing > eng.traine

Re: Tesseract vs. Commercial OCR

2012-05-24 Thread zdenko podobny
On Thu, May 24, 2012 at 8:08 AM, nikolaykhl wrote: > I agree that Abbyy will do the job more accurate out of the box and is > easier to get started with. > You may also want to have a look at this article: > http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison > > This comparis

Re: Tess3.01 hocr output not working with pdfbeads

2012-05-26 Thread zdenko podobny
Discussion could be found in (closed and open) Issues (;-) ). Initial hOCR support[1] comes from issue 263[2] and was submitted by amkryukov. As you can see this patch implemented 'ocr_word'and 'xocr_word'. They are not part of hOCR spec. 'xocr_word'was changed[3] to 'ocrx_word'based on issue is

Re: Page layout analysis - don't split columns.

2012-05-27 Thread zdenko podobny
in 3.0x you can set page segmentation mode (search for SetPageSegMode or variable "tessedit_pageseg_mode"). I think proper mode should help you. If I remember correctly, that was report here at forum, who to compile current tesseract for android. -- Zdenko On Sun, May 27, 2012 at 12:06 PM, Joe

Re: Training tesseract

2012-05-27 Thread zdenko podobny
Just small correction: tesseract-ocr 3.0x did not use libtiff directly, but via leptonica. -- Zdenko On Sun, May 27, 2012 at 12:25 PM, Stane wrote: > 1. > Once litiff is properly installed you shouldn't get any problems later > on. > An alternative to the multipage things is to have each page

Re: Different results for v3.01 and current code line

2012-05-28 Thread zdenko podobny
On Mon, May 28, 2012 at 8:18 PM, steve8918 wrote: > Hi, I'm finding a discrepancy between the results I get from version 3.01 > and the most current code in svn. > > I'm working on WinXP SP3 32-bit, and using Visual Studio 2008 (regular > version, not express). > > Using the tesseract-ocr-setup-3

Re: language options for hebrew

2012-05-28 Thread zdenko podobny
On Tue, May 29, 2012 at 6:56 AM, stephen234 wrote: > I have noticed after installing Tesseract 3.01 in Windows (with > the .msi installer) can you send me link to .msi installer? > that there are three .traineddata files for > Hebrew: heb.traineddata, heb-ras.traineddata, heb-seg.traineddata. A

Re: Different results for v3.01 and current code line

2012-05-30 Thread zdenko podobny
As far as I know Ray is testing new version on Ubuntu. There should not be difference in output based on platform. As you can see from commit log[1] 645-670 is implementation of 3.02 version so of course several of revision from this range could be broken [1] http://code.google.com/p/tessera

tesseract testing suite

2012-06-01 Thread zdenko podobny
Hi all, Does anybody has (is working on) some kind of "tesseract testing suite"? I am looking for some tool that would enable me evaluate quality of tesseract output. I would like to play with image quality, tesseract settings, maybe training... -- Zdenko -- You received this message because

Re: unicharset matching upper and lower case letters

2012-06-01 Thread zdenko podobny
Description of unicharset is in its manual page[1]. Also in past I found that some information are missing from unicharset (generated by unicharset_extractor) e.g. 'script' is NULL, 'glyph_metrics' is IMO useless). This is one of the reason why I am looking for test suite - to see if adding such i

Re: Character confidence

2012-06-02 Thread zdenko podobny
have a look at issue 714[1]. Reporter proposed also fix (you will need to change tessseract code) [1] http://code.google.com/p/tesseract-ocr/issues/detail?id=714 -- Zdenko On Sat, Jun 2, 2012 at 10:32 AM, hiran.suvrat wrote: > Hi

Re: tesseract testing suite

2012-06-03 Thread zdenko podobny
-- Zdenko On Fri, Jun 1, 2012 at 11:19 AM, Nick White wrote: > Hi Zdenko, > > On Fri, Jun 01, 2012 at 10:30:49AM +0200, zdenko podobny wrote: > > Does anybody has (is working on) some kind of "tesseract testing suite"? > > > > I am looking for some tool tha

Re: no Microfeat after mfttraining stage

2012-06-06 Thread zdenko podobny
Are you training for unreleased alpha testing 3.02 version ;-) ? Well there is no update for training (yet). I start to put my notes[1] what I found (just for me ;-) ) - at the moment there is not a lot of information and maybe there are some things that I misunderstood ;-) . [1] http://www.sk-sp

Re: FAILURE! Couldn't find a matc hing blob in create the box.train

2012-06-07 Thread zdenko podobny
Your input image file is bad. You will get errors unless you fix it. Tesseract use for ocr (and training) binary images ([1]) - see attachment (it was created via api->DumpPGM and I converted from pgm to png) [1] http://en.wikipedia.org/wiki/Binary_image -- Zdenko On Thu, Jun 7, 2012 at 5:11 AM

Re: number-dawg and punc-dawg

2012-06-07 Thread zdenko podobny
http://tesseract-ocr.googlecode.com/svn/trunk/doc/combine_tessdata.1.html#_components : lang.punc-dawg (Optional) A dawg made from punctuation patterns found around words. The "word" part is replaced by a single space. lang.number-dawg (Optional) A dawg made from tokens which originally containe

Re: unicharset script and metrics questions

2012-06-07 Thread zdenko podobny
On Thu, Jun 7, 2012 at 12:29 PM, Nick White wrote: > On Thu, Jun 07, 2012 at 08:22:27AM +0200, zdenko podobny wrote: > > I start to put my notes[1] what I found (just for me ;-) ) - at the > moment > > there is not a lot of information and maybe there are some things that &

Re: unicharset script and metrics questions

2012-06-07 Thread zdenko podobny
elp, > > Steve > > > On Thursday, June 7, 2012 6:10:01 AM UTC-7, zdpo wrote: > >> >> >> On Thu, Jun 7, 2012 at 12:29 PM, Nick White <> wrote: >> >>> On Thu, Jun 07, 2012 at 08:22:27AM +0200, zdenko podobny wrote: >>> > I st

Re: tesseract testing suite

2012-06-08 Thread zdenko podobny
On Fri, Jun 8, 2012 at 11:05 AM, Nick White wrote: > Hi Zdenko, > > On Fri, Jun 01, 2012 at 10:30:49AM +0200, zdenko podobny wrote: > > Does anybody has (is working on) some kind of "tesseract testing suite"? > > I'm now at a stage that some testing scripts wo

Re: number-dawg and punc-dawg

2012-06-08 Thread zdenko podobny
On Fri, Jun 8, 2012 at 11:40 AM, Nick White wrote: > Hi Zdenko, > > I saw the descriptions you give below, I just wasn't very clear on > what they meant. > > On Thu, Jun 07, 2012 at 02:50:57PM +0200, zdenko podobny wrote: > > lang.punc-dawg > > (Optional) A

Re: low resolution digits recognition

2012-06-12 Thread zdenko podobny
yes it is. see [1]. [1] http://code.google.com/p/tesseract-ocr/wiki/FAQ#Output_without_result_or_bad_output -- Zdenko On Tue, Jun 12, 2012 at 11:40 AM, dima mertsalov wrote: > Is it possible to recognize this image > http://s004.radikal.ru/i205/1206/b4/bf4643a81085.png > I tried without succe

Re: Recognizing color-on-black screenshots of fixed fonts

2012-06-12 Thread zdenko podobny
On Wed, Jun 13, 2012 at 6:16 AM, Chris wrote: > For a project I want to recognize the text taken from screenshots from > programs and games. > > I have a lot of assumed knowledge which should help me with the > recognition: > > - The font used is usually arial 12pt, plus one or two others. > - Ba

Re: The most simple use of tesseract in C++ app

2012-06-14 Thread zdenko podobny
it looks like you need to specify leptonica library for linking (-llept) Dňa 14.6.2012 9:13, "pcollinse" napísal(-a): > Thanks for this wonderfully simple example! > > It compiles for me on Ubuntu, but not on Mac OSX. Anyone know what I'm > missing here? > > $ g++ -o test test.cpp -I /opt/local/i

Re: Changing tesseract output ?

2012-06-16 Thread zdenko podobny
On Sat, Jun 16, 2012 at 5:51 PM, Dan Peleg wrote: > Hey, > > When i run "tesseract.exe test.jpg output.txt -l eng" the output goes to > the output.txt file. > > Is there an option to make it output to the console? StandardOutput ? > Stdin? something else then a file? > No, there is no such option

Re: Dictionary

2012-07-07 Thread zdenko podobny
try to set these variables (found in 3.02) to false: load_system_dawg load_freq_dawg load_punc_dawg load_number_dawg load_unambig_dawg load_bigram_dawg load_fixed_length_dawgs Of course not each language data files has all these dictionaries. Their are optional. In 3.02 there are more dictionaries

Re: errors while building libtesseract.lib, How to solve the problem?

2012-07-15 Thread zdenko podobny
please provide more detail: e.g. what version you try to build, other configurations are built ok? Dňa 15.7.2012 13:29, "Q Kyubuem Lim" napísal(-a): > I'm not a native english speaker..my writing may be wrong.. > > I'm struggled with building libtesseract.lib. > > while compiling with the project

Re: mftraining runtime error

2012-07-16 Thread zdenko podobny
On Mon, Jul 16, 2012 at 4:39 PM, Markos N. Dendrinos wrote: > when i try to run mftraining with 3 training sets > mftraining -F font_properties -U unicharset -O ell.unicharset > ell.timesregular.exp0.tr ell.timesregular.exp1.tr ell.timesregular.exp2.tr > the following error appears: > > Class->Nu

Re: errors while building libtesseract.lib, How to solve the problem?

2012-07-16 Thread zdenko podobny
this is strange - I would say it looks like there is problem with backslash. Maybe it is related to your local settings (internationalization?). But then this error should be consistent in all configurations... Are you sure you did not modified something? -- Zdenko On Mon, Jul 16, 2012 at 2:51

Re: Using Tesseract in Visual Studio 2010

2012-07-17 Thread zdenko podobny
dawg2wordlist is included in version 3.02. It is available in svn. There is VS2008 solution and as far as I know VS2010 should be able to open/import it -- Zdenko On Tue, Jul 17, 2012 at 2:57 AM, blavatsky3 < nine.eleven.is.an.inside@gmail.com> wrote: > Hi TP, > > I was wondering if you

[Annoucement] QT Box Editor 1.09

2012-07-19 Thread zdenko podobny
QT Box Editor 1.09 was released. It is a multi-platform visual editor for tesseract-ocr box files(used for OCR training) based on QT4 library . Some of the featu

Re: [Annoucement] QT Box Editor 1.09

2012-07-19 Thread zdenko podobny
r excellent QTBox Editor created by the team > of Zdenko Podobny. > > Trust in the forthcoming next version, another feature "generate > image/box files from the text file" > will be added as a crown to QTBox Editor program. > There are 2 programs that offers this

Re: tesseract visual studio 2010 c++

2012-07-22 Thread zdenko podobny
On Sat, Jul 21, 2012 at 11:04 PM, Nada Feteha wrote: > it is version tesseract 3.1 > > Well, there is no tesseract 3.1 (yet) ;-) I suggest you to use version 3.02 from svn - there is new solution (for VS2008[1], but IMO vs2010 should import it easily) that create library automatically. In develop

Re: Example of using pixReadMemTiff

2012-07-24 Thread zdenko podobny
have a look at leptonica documentation and example program ioformats_reg.c. -- Zdenko On Tue, Jul 24, 2012 at 12:35 PM, newtotesseract wrote: > Hi friends > > I am trying to use buffer for OCR with tesseract 3.02. > > Can someone please help me know, how do we give the correct value of the > si

Re: Replacing the tesseract 3.02 alpha vs2008 directory

2012-07-24 Thread zdenko podobny
Do you have any problem with Tesseract-OCR VS 2008 Project you can try out? On Tue, Jul 24, 2012 at 9:01 AM, blavatsky3 < nine.eleven.is.an.inside@gmail.com> wrote: > Hi , Do you have a copy of the Tesseract-OCR VS 2010 Project I could try > out ? > > Thanks > > Richard > > > On Wednesday, Fe

Re: Error in building Tesseract-OCR

2012-07-24 Thread zdenko podobny
On Wed, Jul 25, 2012 at 12:21 AM, Nada Feteha wrote: > I try to build tesseract 3.02 on Visual Studio 2010 by this instruction > http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/building.html > > the problem in step 2 when I try to build the static library LIB_Debug , I > fond this error

Re: Replacing the tesseract 3.02 alpha vs2008 directory

2012-07-24 Thread zdenko podobny
Why such a strong feeling for simple question ;-) ? There is a reason why I asked for it. I try to collect all open issues for (not released) 3.02 version (and try to fix it if possible). Official supported is VS 2008 solution and until last week (now there are some reports) there was no informati

Re: Error in building Tesseract-OCR

2012-07-25 Thread zdenko podobny
Thanks for info. There is Post-Build Event command for libtesseract302 project. In attachment there is screenshot from vs2008 where you can find it. This command is responsible for testing if there is output directory for lib ("..\..\..\lib"). If it does not exists - it create it with command "md"

Re: can't compile the android project

2012-07-26 Thread zdenko podobny
On Thu, Jul 26, 2012 at 11:38 AM, Adam Dymitruk wrote: > I'm trying to compile the Android project found here: > http://code.google.com/p/tesseract-android-tools/source/list?repo=default > > but I'm getting an error: > > jni/com_googlecode_leptonica_android/common.h:22:24: fatal error: > allheade

Re: Error in building Tesseract-OCR

2012-07-26 Thread zdenko podobny
So I found out that Microsoft changed Macro "behavior" in VS2010(see e.g [1]), so you need to correct "Target name" manually (see attachment): LIB_Release: $(ProjectName)-static LIB_Debug: $(ProjectName)-static-debug DLL_Debug: $(ProjectName)d DLL_Release is ok by default... [1] http://social.msd

Re: hocr zero length ocrx_word elements

2012-07-26 Thread zdenko podobny
On Thu, May 17, 2012 at 6:38 PM, Carlos wrote: > using tesseract 3.01 > > the hocr of a diagram generated by tesseract includes an ocr_line > comprised of 0 length ocrx_words. For example: > > class='ocr_word' id='word_1_11' title="bbox 401 3418 652 3588"> class='ocrx_word' id='xword_1_11' titl

Re: hOCR Character Encoding Problem

2012-07-30 Thread zdenko podobny
On Mon, Jul 30, 2012 at 3:13 PM, Cian Mc Govern wrote: > Hi all, > > I'm using Tesseract with the hOCR output format. When I invoke Tesseract > on an image, the results are returned in hOCR format with a UTF-8 character > encoding. However, if I then convert the same image to TIFF format from > PN

Re: hOCR Character Encoding Problem

2012-07-31 Thread zdenko podobny
I did not find any UTF-8 character in your image, so IMO it is difficult to judge if text is encoded as UTF-8 or latin1 (there is no BOM...). I repeated steps you mentioned (I just named output as output-jpg and output-tif) with this results (on openSUSE 12.1 64bit, with the latest svn code):

Re: hOCR Character Encoding Problem

2012-08-01 Thread zdenko podobny
fixed in r736. Now 'file -i output-jpg.html' and 'file -i output-tif.html' reports the same: application/xml; charset=utf-8 But I found out that (as side efect?) hocr produce "empty words" in xml. It is IMO not problem (xml is valid). -- Zdenko On Wed, Aug 1, 2012 at 11:25 AM, Cian Mc Govern wro

Re: hOCR Character Encoding Problem

2012-08-02 Thread zdenko podobny
As far as I remember hocr spec (maybe it was other doc ;-) ), if ocr did not recognize some part of input image, it should save that area as an image (e.g. there will be not empty word but link to image). tesseract-ocr do not have these feature (yet). -- Zdenko On Thu, Aug 2, 2012 at 12:08 PM,

Re: errors when running tesseract.exe

2012-08-02 Thread zdenko podobny
your command has a wrong syntax. Please run just tesseract.exe to see correct syntax. Warnings regarding "unknown field" could be ignored (they are from libtiff) - search on internet for reasons if it is interesting for you. -- Zdenko On Thu, Aug 2, 2012 at 2:11 PM, js wrote: > C:\Program Files

Re: Procedure for creating tesseract302.so from tesseract

2012-08-02 Thread zdenko podobny
On Thu, Aug 2, 2012 at 1:50 PM, malshani hasanthika wrote: > Please anyone provide me the exact procedure to produce tesseract302.so > in ubuntu environment? I was able to build tesseract in ubuntu but the .so > file was not created. > > Did you run "sudo make install" ? > -- Zdenko -- You re

Re: Having traindata files uncombined

2012-08-11 Thread zdenko podobny
On Sat, Aug 11, 2012 at 12:46 PM, Chathuri Gunawardhana < lanch.gunawardh...@gmail.com> wrote: > Yes I was able to unpack them, added words to wordlist and word-freq files > created dawg from these 2 files and then pack all to create traindata. But > with newly created traindata also, tesseract do

Re: Having traindata files uncombined

2012-08-11 Thread zdenko podobny
On Sat, Aug 11, 2012 at 12:58 PM, Chathuri Gunawardhana < lanch.gunawardh...@gmail.com> wrote: > Image that I'm trying to identify is attached. Most words in here are not > identified correctly. I added these words to user words and combined. But > still didn't get the expected output. > > your at

Re: Having traindata files uncombined

2012-08-11 Thread zdenko podobny
www.taprobanetravels.com/images/map-of-sri-lanka.jpg. It is high > quality than above. > > > On Sat, Aug 11, 2012 at 4:40 PM, zdenko podobny wrote: > >> >> On Sat, Aug 11, 2012 at 12:58 PM, Chathuri Gunawardhana < >> lanch.gunawardh...@gmail.com> wrote: >

Re: Having traindata files uncombined

2012-08-12 Thread zdenko podobny
lt; > lanch.gunawardh...@gmail.com> wrote: > >> >> >> -- Forwarded message -- >> From: zdenko podobny >> Date: Sat, Aug 11, 2012 at 6:38 PM >> Subject: Re: Having traindata files uncombined >> To: tesseract-ocr@googlegroups.com >> >> >> Yea

Re: option digits

2012-08-20 Thread zdenko podobny
On Sat, Aug 18, 2012 at 1:52 AM, Francisco Lahuerta Calahorra < fcolahue...@gmail.com> wrote: > It seems that when the option digits is use, it does not include the > recognition of the following characters > point > comma > E > e > plus > minus > can somebody confirm this? > I can no

Re: Problem with simple images

2012-08-20 Thread zdenko podobny
On Mon, Aug 20, 2012 at 11:20 AM, gangabass wrote: > Hi, > I can't OCR simple gif files. Source files look like this one: > http://postimage.org/image/p42rbuv8d/. > > Also I have tried to convert it (using ImageMagick) to this one: > http://postimage.org/image/bnfzyhwtz/ > > But without any luck

Re: Character Template Matching Possible on Tesseract?

2012-08-20 Thread zdenko podobny
On Mon, Apr 9, 2012 at 9:47 PM, David Eger wrote: > The ability to use user patterns was added by Tesseract 3.01, and now has > a little documentation. See the comment in dict/trie.h: > > > http://code.google.com/p/tesseract-ocr/source/browse/tags/release-3.01/dict/trie.h > > And the newly updat

Re: Problem with simple images

2012-08-22 Thread zdenko podobny
On Wed, Aug 22, 2012 at 12:32 PM, gangabass wrote: > I have this for my sample image: "WL7@dr—e\cN1L7m—sens de" > > What's wrong with my tesseract? > There is nothing wrong with tesseract. Did you try to read FAQ[1]? [1] http://code.google.com/p/tesseract-ocr/wiki/FAQ#Output_without_result_or_b

Re: Integrating tesseract into Qt (C++ project)

2012-08-22 Thread zdenko podobny
On Wed, Aug 22, 2012 at 6:38 AM, Andres wrote: > Yes, it is possible. > > You have 2 choices: > > - you can call tesseract executable from your app. > > - you can use the api interface and link your project with tesseract. > If you want to use c++ API you can use tesseractmain.cpp[1] as example

Re: Problem building libtesseract3.02 with MS VC++ 2008

2012-08-22 Thread zdenko podobny
On Wed, Aug 22, 2012 at 2:40 PM, Davor Pleskina wrote: > Hello, > > I am complete *dumb* in VC++ and it is normal I run into problems as soon > as I try to build something, but this one could be simple yet out of reach > for me. > > I got source from SVN trunk and leptonica from their site. Howeve

Re: Integrating tesseract into Qt (C++ project)

2012-08-23 Thread zdenko podobny
What OS + IDE are you using? -- Zdenko On Thu, Aug 23, 2012 at 7:30 PM, Milan wrote: > Thank you all for the answers. > > Zdenko I have a really beginners question, what are all the steps I need > to take so I can start working on something similar to the application you > showed me[1]. My bigg

Re: Problem building libtesseract3.02 with MS VC++ 2008

2012-08-23 Thread zdenko podobny
On Thu, Aug 23, 2012 at 6:20 PM, Davor Pleskina wrote: > > > On Thursday, August 23, 2012 5:06:52 PM UTC+2, Nick White wrote: >> >> On Thu, Aug 23, 2012 at 07:01:42AM -0700, Davor Pleskina wrote: >> > Now I understand what my coworker meant by "being optimistic about >> getting help >> > via forum

Re: Re: Error while training tesseract from command line.

2012-08-24 Thread zdenko podobny
On Thu, Aug 23, 2012 at 8:24 PM, goetzibubu wrote: > I have a similar problem: > I use tesseract V3.01 0n WIN7. > I produced 6 *.tr files (deu.handschriftgoetz.exp1.tr > deu.handschriftgoetz.exp2.tr...) > and a file for font_properties handschriftgoetz.tr. > With mftraining I tested different entr

Re: Problem building libtesseract3.02 with MS VC++ 2008

2012-08-24 Thread zdenko podobny
On Thu, Aug 23, 2012 at 9:24 PM, Davor Pleskina wrote: > Hello, > > I tried to use 1.69. But it happens 1.68 has the same error (wrong > definition) in allheaders.h. Never mentioned leptprotos.h. > > http://tesseract-ocr.googlecode.com/svn-history/r683/trunk/vs2008/doc/setup.html : First create

Re: Integrating tesseract into Qt (C++ project)

2012-08-24 Thread zdenko podobny
First of all you need tesseract dependencies (leptonica + its dependencies). You can compile it by yourself, but in this case you need to install mingw+msys environment... than you need to compile tesseract library. In past I was successful in using leptonica library build by VC++ ( leptonica-1.6

Re: Integrating tesseract into Qt (C++ project)

2012-08-24 Thread zdenko podobny
On Fri, Aug 24, 2012 at 12:32 PM, TP wrote: > On Fri, Aug 24, 2012 at 2:34 AM, zdenko podobny wrote: > > First of all you need tesseract dependencies (leptonica + its > dependencies). > > You can compile it by yourself, but in this case you need to install > > mingw+m

Re: Integrating tesseract into Qt (C++ project)

2012-08-26 Thread zdenko podobny
Have a look at https://github.com/zdenop/tesseract-mingw - there are mingw versions of (hopefully) all needed files. You should be able to integrate them easily to QT Creator project. -- Zdenko On Fri, Aug 24, 2012 at 11:34 AM, zdenko podobny wrote: > First of all you need tesser

Re: [tesseract-ocr] Makefile in master branch of tesseract-ocr/tesseract

2018-02-23 Thread Zdenko Podobny
https://github.com/tesseract-ocr/tesseract/blob/master/INSTALL.GIT.md Zdenko 2018-02-23 21:53 GMT+01:00 : > I don't see Makefile in the master branch of tesseract-ocr/tesseract, Is > there a way for me to get it from other branches? I needed to install > tesseract from master branch to get the l

Re: [tesseract-ocr] Tesseract convert image to gibberish

2018-02-25 Thread Zdenko Podobny
https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality Zdenko 2018-02-25 11:38 GMT+01:00 Dusayanta Prasad : > I am try to convert the below image using Tesseract in linux using the > following command: > > tesseract img.jpg out -l eng > > >

Re: [tesseract-ocr] Re: tesseract 4.00 beta is released ? I saw the who use the tesseract 4.00 beta

2018-03-12 Thread Zdenko Podobny
it is official: https://github.com/tesseract-ocr/tesseract/releases Zdenko 2018-03-12 10:09 GMT+01:00 adarsh shukla : > There is no official release of tesseract 4.0 Beta. There might be some > unofficial release, not found anything as such in Google. > > On Monday, March 12, 2018 at 10:17:35 A

Re: [tesseract-ocr] What is differnt tesseracr 4.00(alpha) from tesseract4.00(Beta) in details ?

2018-03-16 Thread Zdenko Podobny
here are details: https://github.com/tesseract-ocr/tesseract/commits/master Zdenko 2018-03-16 12:37 GMT+01:00 이경준 : > Hi ~ > > What is differnt tesseracr 4.00(alpha) from tesseract4.00(Beta) in details > ? > > Thank You > > -- > You received this message because you are subscribed to the Google

Re: [tesseract-ocr] Compilation Error Tesseract 4.0 - macOS High Sierra

2018-03-17 Thread Zdenko Podobny
you specified that c++ compiler is: g++-6 and your system reports: g++-6: command not found Zdenko 2018-03-17 11:57 GMT+01:00 Richard McAlexander : > I'm having trouble compiling Tesseract 4.0. I have all dependencies > installed. The error occurs when after I run ./autogen.sh in the > terminal

Re: [tesseract-ocr] Compilation Error Tesseract 4.0 - macOS High Sierra

2018-03-17 Thread Zdenko Podobny
Why you specify compiler (especially if it can not be found)? Zdenko 2018-03-17 19:07 GMT+01:00 Richard McAlexander : > Thanks. Anyone know how I can fix that? I have gcc/Xcode installed, not > sure why its not finding the command. > > On Saturday, March 17, 2018 at 9:10:38 AM UTC-4, zdenop wrot

Re: [tesseract-ocr] Tesseract output format: doc or docx

2018-03-22 Thread Zdenko Podobny
tesseract can produce output in txt, pdf and hocr (html). Tesseract focus is to provide ocr engine and not complex document output like docx or ods. Zdenko 2018-03-22 7:47 GMT+01:00 : > Can I use tesseract in Ubuntu to get .docx or .doc output(word format). > > Currently .txt output is received

Re: [tesseract-ocr] [4.0.0-beta.1] read_params_file: parameter not found: PNG

2018-04-01 Thread Zdenko Podobny
If you are really insterested in help than provide full information/command how you run tesseract. Zdenko 2018-03-31 20:19 GMT+02:00 JP T : > Hi > > I just updated from version 3.04.01 but now tesseract fails with above > message if I give the -psm option. > input files are PNG. > > any idea? >

Re: [tesseract-ocr] [4.0.0-beta.1] read_params_file: parameter not found: PNG

2018-04-02 Thread Zdenko Podobny
... and it was exactly the same in tesseract 3.0x as in 4.0 Zdenko 2018-04-02 0:14 GMT+02:00 JP T : > Solved: > must be* tesseract infile outfile options* instead of standard unix *program > options infile outfile*. > On Sun 1 Apr, 2018, 7:25 PM JP T, wrote: > >> Hi >>> >>> I just updated f

Re: [tesseract-ocr] [4.0.0-beta.1] read_params_file: parameter not found: PNG

2018-04-02 Thread Zdenko Podobny
aim is to have tool that is easy portable with minimum dependencies. IMO it is standard on linux/unix like system to use --help option for explanation of usage. Zdenko 2018-04-02 14:38 GMT+02:00 JP T : > Well, the problem is error handling. > If tesseract would have given a meaningful error mess

Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Zdenko Podobny
First of all: your command if wrong. It should be constructed this way: tesseract image output [options] See tesseract --help for more details. Next: error message is clear: Error opening data file ./tessdata/Fraktur.traineddata You (or your installation) instructed to look for trainneddata

Re: [tesseract-ocr] Error opening traineddata files on Mac High Sierra

2018-04-10 Thread Zdenko Podobny
If you followed someone tutorial you should complain to its author ;-). I am not familiar with Mac, but on linux you can do it (in command) this way: export TESSDATA_PREFIX=/usr/loca/share/ Maybe it is similar on Mac. Try to google how to set environment variable on Mac. Zdenko 2018-04-10 13

Re: [tesseract-ocr] How to include tesseract 4.00 to my visual studio c++ ??

2018-04-12 Thread Zdenko Podobny
You should download the source and build and install it with cppan + cmake. See https://github.com/tesseract-ocr/tesseract/wiki/Compiling#develop-tesseract Zdenko 2018-04-11 4:21 GMT+02:00 : > i have been using tesseract 3.04 i could use it just by adding the include > file to my project, but wh

Re: [tesseract-ocr] install tesseract-4.00.00alpha error

2018-04-17 Thread Zdenko Podobny
You can start with using the latest version and providing details... Zdenko 2018-04-18 7:56 GMT+02:00 Kai Feng : > ./.libs/libtesseract.so: undefined reference to `omp_get_thread_num' > ./.libs/libtesseract.so: undefined reference to `GOMP_sections_end_nowait' > ./.libs/libtesseract.so: undefine

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

2018-04-21 Thread Zdenko Podobny
Time for upgrade? Zdenko 2018-04-21 22:14 GMT+02:00 'DR' via tesseract-ocr < tesseract-ocr@googlegroups.com>: > I'm using: > > tesseract 3.04.01 > leptonica-1.73 > libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : > libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0

Re: [tesseract-ocr] Unsure why tesseract isn't returning the correct text

2018-04-21 Thread Zdenko Podobny
Really? Did you check it before writing to forum? Zdenko 2018-04-21 22:25 GMT+02:00 'DR' via tesseract-ocr < tesseract-ocr@googlegroups.com>: > Where can I find tesseract 4 beta? The github repo goes up to 4 alpha. > > On Saturday, April 21, 2018 at 2:21:49 PM UTC-6, zdenop wrote: >> >> Time for

Re: [tesseract-ocr] Trained font - always one letter wrong

2018-04-25 Thread Zdenko Podobny
Well, you should contact creator of traineddata . We have no clue what they did.. Zdenko 2018-04-25 14:55 GMT+02:00 : > Hello there, > > i don't know what to do anymore... > I want to use tesseract-ocr 3.05 for scanning documents, using the font > "Perfect DOS VGA 437 Win". > Got a traineddata f

Re: [tesseract-ocr] error: required directory

2018-04-25 Thread Zdenko Podobny
We are making reorganization of tesseract. Using the latest code is not recommended at all especially if you do not follow developers communications. Zdenko 2018-04-25 19:59 GMT+02:00 Marius Amado-Alves : > Trying to install on a Mac, cannot pass the autogen.sh step. Any tips > highly apprecia

Re: [tesseract-ocr] just installed, get error messages

2018-04-25 Thread Zdenko Podobny
Why are you building project from source if you have no clue what you do? Based on your other post: you decided to build leptonica without support of common image formats. Dňa št 26. 4. 2018, 7:01 Rolf Schumacher napísal(a): > I just installed from git repository > > tesseract --version shows:

Re: [tesseract-ocr] tesseract 4 beta: openCL useage

2018-04-27 Thread Zdenko Podobny
If you have experience your help will be warmly welcomed. OpenCL is not maintained and it is on good way to be removed if maintainer/contributor will not be found. Anyway it is not used extensively, so there is a place for improvement, Zdenko pi 27. 4. 2018 o 10:21 Janpieter Sollie napísal(a):

Re: [tesseract-ocr] tesseract 4 beta: openCL useage

2018-04-27 Thread Zdenko Podobny
enCL. I do not have any experience with neural networks (i'm > just a high-school (no college educated IT-support guy with some knowledge > about OpenCL), so can you recommend me some documentation to understand the > engine of tesseract 4? > > 2018-04-27 10:50 GMT+02:00 Zd

Re: [tesseract-ocr] How to convert hocr to MS word .docx file

2018-05-02 Thread Zdenko Podobny
MS word ;-) 1. rename test.hoct to test.hocr.html 2. open test.hocr.html in real text editor (e.g. notepad++) and delete lines 2 and 3 otherwise word will produce error message 3. open test.hocr.html in word. Zdenko št 3. 5. 2018 o 1:42 abdu napísal(a): > Is there a program that

<    1   2   3   4   5   6   7   8   9   10   >