Re: Disable Special characters?

2010-04-18 Thread zdenko podobny
Hello,

if I correctly understood "Comment by ffournel, Mar 30, 2010" on
http://code.google.com/p/tesseract-ocr/wiki/FAQ we can achieved the same
behavior by creating config file (e.g. digits in directory
tessdata/configs/) with line:

tessedit_char_whitelist 0123456789

and than to run:

C:>tesseract.exe nine.tif out tessdata/configs/nobatch
tessdata/configs/digits

Zd

On Sun, Apr 18, 2010 at 7:50 PM, MARTIN Pierre  wrote:

> Dear NGuyenQ,
>
> From the page http://www.pixel-technology.com/freeware/tessnet2/
> tessnet2.Tesseract ocr = new tessnet2.Tesseract();
> ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only
>
> This is brilliant advice you just gave him. It is very effective, i just
> tested it on document with only digits and a few special characters.
> Since i'm working with C++ only (No .net wrapper), here is what i recommend
> to do:
>
> // Init your tess API.
>  _tessApi = new tesseract::TessBaseAPI();
> // Set up the current directory and language prefix.
>  _tessApi->Init("./", "cst");
>  // This is only important if you'll be parsing pictures with only one
> line of text (Which is my case).
>  _tessApi->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
> // Here is the trick as explained and pointed by NGuyenQ:
>  _tessApi->SetVariable("tessedit_char_whitelist", "<0123456789");
>  // The in a loop for each of my documents, here is the idea:
>  PIX *pix = pixReadMemTiff((const l_uint8*)buffer.buffer().constData(),
> buffer.size(), 0);
>  _tessApi->SetImage(pix);
> doc.setRecognizedData("OCRLine", QString(text).trimmed());
>  pixDestroy(&pix);
> delete [] text;
>  delete pix;
>  // Release everything.
> _tessApi->Clear();
>  _tessApi->End();
> delete _tessApi;
>
> The very very interesting part is that before, i was getting "D" and "O"
> instead of zeros, sometimes even "A" for "4" and "[]" and "[)" instead of
> zeroes, despite my disambiguation file. Now, i'm getting everything correct,
> which means the *whitelist / blacklist are not just post-processing
> filters, but real "recognition clues"*.
>
> i recommend everyone to take note (Well... i'm discovering this feature and
> it's real consequences, maybe you're not :D).
>
> Pierre.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Training Tesseract 3

2010-04-23 Thread zdenko podobny
Hello,

tesseract 3.0 is in svn:
http://code.google.com/p/tesseract-ocr/source/checkout0 (source code). Some
information can be found in
http://code.google.com/p/tesseract-ocr/wiki/ReadMe (Installation Notes -
3.00 Prerelease)

Zd.

On Thu, Apr 22, 2010 at 10:48 AM, Ayatullah  wrote:

> What is the exact version of Tessearct 3? Is that 2.03 or 3.0?
>
> Nowhere I found tesseract 3.0.
>
> On Apr 12, 11:55 am, rkvsraman  wrote:
> > Hello,
> >
> > I am not able to find the training  manual for tesseract 3. Please
> > point me to one.
> >
> > Thanks
> >
> > -Raman
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract 3.0 without page layout analysis?

2010-04-23 Thread zdenko podobny
Hello,

http://code.google.com/p/tesseract-ocr/wiki/ReadMe, section Installation
Notes - 3.00 Prerelease:
In the executable, page layout analysis is enabled by default. You may need
to turn it off to process small images. No command-line control for this
yet. Sorry. See tesseractmain.cpp.

Zd.

On Wed, Apr 21, 2010 at 10:08 AM, Jan  wrote:

> Hallo,
> is it possible to use tesseract 3.0 without page layout analysis, or
> in one column mode?
> Especially using the tesseract.exe?
> Thanks!!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Extracting files from .tessdata

2010-04-28 Thread zdenko podobny
Hello Ramon,

for extending existing language you need "Tif/Box pairs" see
http://code.google.com/p/tesseract-ocr/wiki/FAQ and there "How do I add just
one character or one font to my favourite language, without having to
retrain from scratch?"

Unfortunately tif/box pairs are provided only for eng, deu, fra, ita, nld
and spa languages... So you can wait that somebody will someday release
tif/box pairs for your language or you will start training from scratch. I
choose second option and this is reason why I started with testing of
training process for  tesseract 3.00.

BR,

Zdenko


On Mon, Apr 26, 2010 at 11:06 AM, Ramon  wrote:

> Hi,
> After some tests I realized the best for me is to put effort to extend
> the Catalan Diccionari which is in svn repository (v3).
> It will be so useful if you can do one of these:
>
> -> deliver the different files combined to create the cat.traineddata
> unified file. (the utf8 files used to generate the dawg would be also
> amazing!).
> -> show how to extract these files from the cat.traineddata and how to
> dawg2utf8 (if it is possible).
>
> THANKS!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract 3.0 without page layout analysis?

2010-04-28 Thread zdenko podobny
If find how to turn it off, please share this info ;-)

Zd.

On Sun, Apr 25, 2010 at 5:43 PM, Jan  wrote:

> Thanks for the info, when I will try to change in the
> tesseractmain.cpp.
>
> Jan
>
>
>
> On 23 Apr., 09:38, zdenko podobny  wrote:
> > Hello,
> >
> > http://code.google.com/p/tesseract-ocr/wiki/ReadMe, section Installation
> > Notes - 3.00 Prerelease:
> > In the executable, page layout analysis is enabled by default. You may
> need
> > to turn it off to process small images. No command-line control for this
> > yet. Sorry. See tesseractmain.cpp.
> >
> > Zd.
> >
> >
> >
> > On Wed, Apr 21, 2010 at 10:08 AM, Jan  wrote:
> > > Hallo,
> > > is it possible to use tesseract 3.0 without page layout analysis, or
> > > in one column mode?
> > > Especially using the tesseract.exe?
> > > Thanks!!
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "tesseract-ocr" group.
> > > To post to this group, send email to tesseract-...@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > tesseract-ocr+unsubscr...@googlegroups.com
> 
> >
> > > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> > To post to this group, send email to tesseract-...@googlegroups.com.
> > To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> > For more options, visit this group athttp://
> groups.google.com/group/tesseract-ocr?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Cannot run tesseract.exe

2010-05-13 Thread zdenko podobny
how did you installed tesseract?


2010/5/13 Mehmet Can Altıgül 

> Hi guys,
>
> I have been tryin to run tesseract.exe but it throws this error:  "Unable
> to load unicharset file ./tessdata/eng.unicharset"
>
> I use this command:  "tesseract.exe ocr.bmp xx.txt"
>
> Seems like engilish unicharset file is missing? How can I overcome this?
>
> Thank you all!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Spaces situation in Training image

2010-05-23 Thread zdenko podobny
Hello,

http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract:

It is *ABSOLUTLEY VITAL* to space out the text a bit when printing, so up
the inter-character and inter-line spacing in your word processor. Not
spacing text out sufficiently will cause "FAILURE! box overlaps no blobs or
blobs in multiple rows" errors during tr file generation, which leads to
FATALITY - 0 labelled samples of "x", which leads to "Error: X classes in
inttemp while unicharset contains Y unichars" and you can't use your nice
new data files. This situation will improve in the future, as we are working
on a solution, but for 3.00 APPLY_BOXES errors remain the most problematic
difficulty for people training tesseract.


so you do not need to create space between words, but it is important you
have enough space between character.

Best regards,

Zdenko

2010/5/1 M. Bashir Al-Noimi 

>  Hello again,
>
> Does any one know the answer? I trained Tess and I didn't find any
> difference.
>
> On 30/04/2010 10:18 م, M. Bashir Al-Noimi wrote:
>
> Hi All,
>
> As I noticed in Traning image for English, French and Dutch all the
> charecters nearly groups as words, so I'm asking grouping character in
> training image does it affect on recognition process ?
>
> for example in eng.arial.g4.tif I captured the following chop:
>
> [image: eng.arial.g4.png]
>
> if I input the following chop does it give me same reconnection result just
> like the above?
>
> [image: eng.arial.g4_test.png]
>
>
> --
> Best Regards
> Muhammad Bashir Al-Noimi
> My Blog: http://mbnoimi.net
>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

<><>

Re: Integrating Tesseract with another open source project

2010-05-23 Thread zdenko podobny
Hi,

it will be better to use forum than to contact me (I am not programmer - I
am just user that try to read documentation :-) )

Zd.

On Sat, May 22, 2010 at 7:38 PM, Thilanka  wrote:

> Hi Zdenko,
>
> Thank you very much for the tips. I'll contact you if
> I face any problem on this.
>
> Regards,
> Thilanka.
>
> On May 22, 1:40 pm, Zdenko Podobný  wrote:
> > seehttp://code.google.com/p/tesseract-ocr/wiki/ReadMe:
> >
> > Another important change is that you should *really* be using
> > TessBaseAPI if you are linking with another program. In Linux
> > (non-Windows) the main library is now libtesseract_api.a instead of
> > the old libtesseract_full.a. In windows, use the define
> > TESSDLL_IMPORTS before including baseapi.h in your code to get the
> > symbols of the TessBaseAPI class.
> >
> > Zd.
> >
> > Dn(a 21.05.2010 19:21, Thilanka  wrote / napísal(a):
> >
> >
> >
> > > Hi,
> >
> > > I'm working with a the Sahana OCR project for my gsoc session.
> > > In this I'm planning to use Tesseract for the character recognition in
> > > the Sahana OCR project(is it an opensource project). The Sahana OCR
> > > code has written in Visual C++. We cannot use the Tesseract exe for
> > > our project. So I'm planing to join the Tesseract code with the Sahana
> > > OCR code. But I don't have a good understanding about the Tesseract
> > > Architecture and how I can integrate the two sources codes of the
> > > Sahana and Tesseract together. So can some one please helpm me on this
> > > problem.
> >
> > > Regards,
> > > Thilanka.
> >
> > > --
> > >http://coders-view.blogspot.com/
> > >http://thilankagekawuluwa.blogspot.com/
> > >http://twitter.com/thilanka_k
> >
> >
> >
> >  smime.p7s
> > 5KViewDownload
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Danish fraktur support in r319

2010-05-25 Thread zdenko podobny
Did you try to use google ;-)? there is plenty of examples e.g.:
http://wiki.creativecommons.org/HOWTO_Patch

Zd.

On Tue, May 25, 2010 at 2:53 PM, Sriranga(77yrsold)  wrote:

> Jimmy,
> How to do? Alternatively will you kindly  forward copy of pached cpp file
> to re-testing and feedback
> With regards,
> -sriranga(77yrsold)
>
>
> On Tue, May 25, 2010 at 6:16 PM, Jimmy O'Regan  wrote:
>
>> On 25 May 2010 13:16, Sriranga(77yrsold)  wrote:
>> > Jimmy,
>> > I may kindly guided whether contents of 303,patch to be copied and
>> pasted in
>> > the
>> >
>> > image/svshowim.cpp ?
>> > With regards,
>> > -sriranga(77yrsold)
>> >
>> >
>>
>> Err no; it's a patch. It's meant to be applied using the patch
>> program.
>>
>> --
>>  jimregan, that's because deep inside you, you are evil.
>>  Also not-so-deep inside you.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: PLEASE GIVE ME THE CODE OF VERSION 3 OF TESSERACT-OCR

2010-05-26 Thread zdenko podobny
http://code.google.com/p/tesseract-ocr/source/checkout

On Wed, May 26, 2010 at 11:14 AM, sushovon  wrote:

> please anyone mail me or give the from where i can download the code
> along with build of version 3 which is yet to be published.i badly
> need this.plz anyone help me.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Call for testers...

2010-05-27 Thread zdenko podobny
It looks like you did not run './runautoconf' you before './configure'

Zd.

On Thu, May 27, 2010 at 2:34 PM, Karl Wettin  wrote:

>
> 26 maj 2010 kl. 16.23 skrev Jimmy O'Regan:
>
>
>  I've just updated the SVN version to use libtool (and shared
>> libraries, that sort of thing) but it's only tested on Ubuntu Lucid.
>>
>> Anyone care to take it for a test run?
>>
>
>
> OS X via MacPorts using gcc43 on a fresh svn co.
>
> ./configure ends with:
>
> config.status: executing libtool commands
> sed: config/ltmain.sh: No such file or directory
> sed: config/ltmain.sh: No such file or directory
> mv: rename libtoolT to libtool: No such file or directory
> cp: libtoolT: No such file or directory
> chmod: libtool: No such file or directory
> config.status: executing depfiles commands
>
> Configuration is done.
> You can now build tesseract by running:
>
>
> Of course make then fails:
>
> make  all-recursive
> Making all in ccstruct
> /bin/sh ../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I. -I..
>  -I../ccutil -I../cutil -I../image -I../viewer -I/opt/local/include  -g -O2
> -MT blobbox.lo -MD -MP -MF .deps/blobbox.Tpo -c -o blobbox.lo blobbox.cpp
> mv -f .deps/blobbox.Tpo .deps/blobbox.Plo
> mv: rename .deps/blobbox.Tpo to .deps/blobbox.Plo: No such file or
> directory
> make[3]: *** [blobbox.lo] Error 1
> make[2]: *** [all-recursive] Error 1
> make[1]: *** [all-recursive] Error 1
> make: *** [all] Error 2
>
>
>
>
>karl
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: newest release of tesseract

2010-06-08 Thread zdenko podobny
Hello,

V3.00 was not released yet. I am not sure what is criteria for stable (you
can compile it and run it ;-) ). You can get code via svn:
 http://code.google.com/p/tesseract-ocr/source/checkout

Zd.

On Mon, May 31, 2010 at 4:43 PM, butch  wrote:

> I have seen mention "tesseract 3.0" in some posts.
>
> I am currently using V2.04
>
> Is V3.xxx stable?  If so where can I find it.
>
> Distributor ID: Ubuntu
> Description:Ubuntu 9.10
> Release:9.10
> Codename:   karmic
>
> I am running on a x86_64 machine.
>
> thanks?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Forking tesseract.

2010-06-10 Thread zdenko podobny
No what?

BTW: this question was intended to Ray Smith/google or provider of data in
svn. Do you speak on their behalf?

Zd.

On Thu, Jun 10, 2010 at 4:34 AM, Elmer Fittery wrote:

> Sorry but no.
>
> On Wed, 2010-06-09 at 22:17 +0200, Zdenko Podobný wrote:
> > Hello,
> >
> > do you intend to release also tiff/box files for (new) languages (in )
> >
> > Can you provide some short example for punc-dawg and number-dawg file?
> >
> > BR,
> >
> > Zd.
> >
> >
> > Dňa 25.05.2010 06:44, Ray Smith  wrote / napísal(a):
> > > I would be very happy for someone to take over maintenance of the
> autotools
> > > part of tesseract. Even better if a team of you can do it... I don't
> get
> > > much time to deal with that, and it doesn't get much priority, since we
> have
> > > our own build system, and windows has to have its own. With someone
> looking
> > > after the build side, I am hopeful that, after 3.00 becomes a tarball,
> I can
> > > keep the svn trunk fully up-to-date with the source code and then maybe
> you
> > > guys can decide when it is a good time to make a new tarball release.
> > >
> > > I made a big hole in the issues list last week, and will attempt to
> work
> > > through the rest this week, as there are useful patches in there that
> should
> > > be applied, and useful bug reports for bugs that can be fixed. WIth the
> > > issues list down to a more manageable size, it should be easier to keep
> up
> > > with it. There is too much for me to manage on my own though, and it is
> > > overwhelming to see that just about every wiki page has as many
> comments
> > > attached as there are open issues
> > >
> > > I saved a lot of time by putting a filter on the forum, but that meant
> I
> > > didn't look at it either, which is not satisfactory. I created the
> > > tesseract-dev forum for developers specifically, but it didn't take
> off. It
> > > would help to have a division between the more mundane parts of the
> forum
> > > and the other items that require my specific attention.
> > >
> > > So please, anyone who wants to help out maintain this site, rather than
> fork
> > > it, let me know, and I will add you to the list of developers. We are
> still
> > > actively developing the code at Google, and I want to be able to get
> the
> > > code out where people can use it.
> > >
> > > Ray.
> > >
> > > On Fri, May 21, 2010 at 5:17 AM, Jimmy O'Regan 
> wrote:
> > >
> > >
> > > > On 14 May 2010, at 14:20, MARTIN Pierre  wrote:
> > > >
> > > >  I have created new autotools files so that Tesseract can be built as
> > > >
> > > > > > > shared libraries (using libtool), which would allow other
> projects to
> > > > > > > link against it much more easily. Unfortunately, the Linux
> > > > > > > distributions (admittedly just Gentoo so far) are reluctant to
> use
> > > > > > > these changes without them being accepted upstream.
> > > > > > >
> > > > > > >
> > > > > > I sympathize with your position.  For over a year, I have been
> > > > > > maintaining a local branch tracking the tesseract-ocr svn trunk
> with
> > > > > > some patches applied that do pretty much the same thing you're
> > > > > > describing, for some personal projects.  I've also been building
> my
> > > > > > own .debs for Ubuntu for easy deployment in some projects I'm
> working
> > > > > > on.
> > > > > >
> > > > > >
> > > > > i'm still very enthusiast with this project of forking Tesseract.
> But as i
> > > > > said before, i won't do it alone, and i had not hear about you
> guys. What
> > > > > amount of time and what skills could you be dedicating to this
> project?
> > > > >
> > > > >
> > > > FWIW, there has been some recent activity in SVN, and several issues
> that
> > > > had patches attached have been committed. If you haven't already
> submitted
> > > > an issue+patch, perhaps now is the time to do so.
> > > >
> > > >
> > > > --
> > > > You received this message because you are subscribed to the Google
> Groups
> > > > "tesseract-ocr" group.
> > > > To post to this group, send email to tesseract-...@googlegroups.com.
> > > > To unsubscribe from this group, send email to
> > > > tesseract-ocr+unsubscr...@googlegroups.com
> 
> >
> > > > .
> > > > For more options, visit this group at
> > > > http://groups.google.com/group/tesseract-ocr?hl=en.
> > > >
> > > >
> > > >
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://group

Re: *** glibc detected *** tesseract: double free or corruption

2010-07-12 Thread zdenko podobny
Hello,

How did you installed Tesseract? Which version?
Please provide more information.

Zd.

On Sun, Jul 11, 2010 at 6:16 PM, msjs08  wrote:

>
> I've installed Tesseract on Mandriva 2010 (64 bit) and I can't get it to
> run.
> It just segfaults.
> I installed gimagereader. This is the error I got when I tried to use
> gimagereader
>
> [r...@desktop test extract]# tesseract slide7.tif textfile.txt
> Tesseract Open Source OCR Engine
> *** glibc detected *** tesseract: double free or corruption (!prev):
> 0x00a80de0 ***
> === Backtrace: =
> /lib64/libc.so.6[0x7fa9e5956bf6]
> /lib64/libc.so.6(cfree+0x6f)[0x7fa9e595b6bf]
> /lib64/libc.so.6(__cxa_finalize+0xa5)[0x7fa9e591a5f5]
> /usr/lib64/libtesseract_full.so.2[0x7fa9e68b97b6]
> === Memory map: 
> 0040-00537000 r-xp  08:05 179583
> /usr/bin/tesseract
> 00737000-0073a000 r--p 00137000 08:05 179583
> /usr/bin/tesseract
> 0073a000-00741000 rw-p 0013a000 08:05 179583
> /usr/bin/tesseract
> 00741000-007cc000 rw-p  00:00 0
> 00a72000-00dd9000 rw-p  00:00 0
>  [heap]
> 7fa9e000-7fa9e0021000 rw-p  00:00 0
> 7fa9e0021000-7fa9e400 ---p  00:00 0
> 7fa9e528d000-7fa9e52a2000 r-xp  08:05 131442
> /lib64/libz.so.1.2.3
> 7fa9e52a2000-7fa9e54a1000 ---p 00015000 08:05 131442
> /lib64/libz.so.1.2.3
> 7fa9e54a1000-7fa9e54a2000 rw-p 00014000 08:05 131442
> /lib64/libz.so.1.2.3
> 7fa9e54a2000-7fa9e54d8000 r-xp  08:05 131605
> /usr/lib64/libjpeg.so.7.0.0
> 7fa9e54d8000-7fa9e56d8000 ---p 00036000 08:05 131605
> /usr/lib64/libjpeg.so.7.0.0
> 7fa9e56d8000-7fa9e56d9000 r--p 00036000 08:05 131605
> /usr/lib64/libjpeg.so.7.0.0
> 7fa9e56d9000-7fa9e56da000 rw-p 00037000 08:05 131605
> /usr/lib64/libjpeg.so.7.0.0
> 7fa9e56da000-7fa9e56e3000 r-xp  08:05 133782
> /usr/lib64/libjbig.so.1.0.0
> 7fa9e56e3000-7fa9e58e2000 ---p 9000 08:05 133782
> /usr/lib64/libjbig.so.1.0.0
> 7fa9e58e2000-7fa9e58e3000 r--p 8000 08:05 133782
> /usr/lib64/libjbig.so.1.0.0
> 7fa9e58e3000-7fa9e58e6000 rw-p 9000 08:05 133782
> /usr/lib64/libjbig.so.1.0.0
> 7fa9e58e6000-7fa9e5a3a000 r-xp  08:05 130868
> /lib64/libc-2.10.1.so
> 7fa9e5a3a000-7fa9e5c3a000 ---p 00154000 08:05 130868
> /lib64/libc-2.10.1.so
> 7fa9e5c3a000-7fa9e5c3e000 r--p 00154000 08:05 130868
> /lib64/libc-2.10.1.so
> 7fa9e5c3e000-7fa9e5c3f000 rw-p 00158000 08:05 130868
> /lib64/libc-2.10.1.so
> 7fa9e5c3f000-7fa9e5c44000 rw-p  00:00 0
> 7fa9e5c44000-7fa9e5c5a000 r-xp  08:05 131433
> /lib64/libgcc_s-4.4.1.so.1
> 7fa9e5c5a000-7fa9e5e59000 ---p 00016000 08:05 131433
> /lib64/libgcc_s-4.4.1.so.1
> 7fa9e5e59000-7fa9e5e5a000 rw-p 00015000 08:05 131433
> /lib64/libgcc_s-4.4.1.so.1
> 7fa9e5e5a000-7fa9e5edb000 r-xp  08:05 210438
> /lib64/libm-2.10.1.so
> 7fa9e5edb000-7fa9e60da000 ---p 00081000 08:05 210438
> /lib64/libm-2.10.1.so
> 7fa9e60da000-7fa9e60db000 r--p 0008 08:05 210438
> /lib64/libm-2.10.1.so
> 7fa9e60db000-7fa9e60dc000 rw-p 00081000 08:05 210438
> /lib64/libm-2.10.1.so
> 7fa9e60dc000-7fa9e61c9000 r-xp  08:05 131450
> /usr/lib64/libstdc++.so.6.0.12
> 7fa9e61c9000-7fa9e63c9000 ---p 000ed000 08:05 131450
> /usr/lib64/libstdc++.so.6.0.12
> 7fa9e63c9000-7fa9e63d r--p 000ed000 08:05 131450
> /usr/lib64/libstdc++.so.6.0.12
> 7fa9e63d-7fa9e63d2000 rw-p 000f4000 08:05 131450
> /usr/lib64/libstdc++.so.6.0.12
> 7fa9e63d2000-7fa9e63e7000 rw-p  00:00 0
> 7fa9e63e7000-7fa9e63fd000 r-xp  08:05 130892
> /lib64/libpthread-2.10.1.so
> 7fa9e63fd000-7fa9e65fc000 ---p 00016000 08:05 130892
> /lib64/libpthread-2.10.1.so
> 7fa9e65fc000-7fa9e65fd000 r--p 00015000 08:05 130892
> /lib64/libpthread-2.10.1.so
> 7fa9e65fd000-7fa9e65fe000 rw-p 00016000 08:05 130892
> /lib64/libpthread-2.10.1.so
> 7fa9e65fe000-7fa9e6602000 rw-p  00:00 0
> 7fa9e6602000-7fa9e6663000 r-xp  08:05 133791
> /usr/lib64/libtiff.so.3.9.1
> 7fa9e6663000-7fa9e6862000 ---p 00061000 08:05 133791
> /usr/lib64/libtiff.so.3.9.1
> 7fa9e6862000-7fa9e6864000 r--p 0006 08:05 133791
> /usr/lib64/libtiff.so.3.9.1
> 7fa9e6864000-7fa9e6865000 rw-p 00062000 08:05 133791
> /usr/lib64/libtiff.so.3.9.1
> 7fa9e6865000-7fa9e6a18000 r-xp  08:05 142456
> /usr/lib64/libtesseract_full.so.2.0.4
> 7fa9e6a18000-7fa9e6c18000 ---p 001b3000 08:05 142456
> /usr/lib64/libtesseract_full.so.2.0.4
> 7fa9e6c18000-7fa9e6c1b000 r--p 001b3000 08:05 142456
> /usr/lib64/libtesseract_full.so.2.0.4
> 7fa9e6c1b000-7fa9e6c26000 rw-p 001b6000 08:05 142456
> /usr/lib64/libtesseract_full.so.2.0.4
> 7fa9e6c26000-7fa9e6cb6000 rw-p  00:00 0
> 7fa9e6cb6000-7fa9e6cd2000 r-xp  08:05 130861
> /lib64/ld-2.10.1.so
> 7fa9e6eb-7fa9e6eb5000 rw-p  00:00 0
> 7fa9e6ecf000-7fa9e6ed1000 rw-p  00:00 0
> 7fa9e6ed1000-7fa9e6ed2000 r--p 0001b000 08:05 130861
> /lib64/ld-2.10.1.so
> 7fa9e6ed2000-7fa9e6ed3000 rw-p 0001c000 08:05 130861
> /lib64/ld-2.10.1.so
> 7fff0fbb3000-7fff0fbec000 rw-p  00:00 0
>  [stack]
> 7fff0fbff00

Re: Problem using DangAmbigs and user-words files

2010-08-06 Thread zdenko podobny
Did you tried the latest revision (r449)?

Zd.

On Wed, Aug 4, 2010 at 3:52 PM, caro  wrote:

> someone to help me?
>
> thank you
>
> On Jul 20, 4:18 pm, caro  wrote:
> > I try to complete these files, after looking at errors appearing
> > during the recognition.
> > Typically, I have the following error which occurs very ofter:
> > tesseract recognizes FESLLTS instead of RESULTS
> >
> > So I had in the file user-word: RESULTS
> > and in the file DangAmbigs:
> > 2 F E 2 R E
> > 2 L L 2 U L
> > 1 F 1 R
> > 1 L 1 U
> >
> > But when adding this, it does not change anything, and the OCR still
> > find FESLLTS, instead of RESULTS.
> > Any idea what am I doing wrong?
> >
> > Thank you for your help,
> > Caroline
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Announcement: new version of pyTesseractTrainer available

2010-08-13 Thread zdenko podobny
Hello,

I would like to announce new version 1.01 of pyTesseractTrainer - successor
of 
tesseractTrainer.py
Version
1.00 is identical with tesseractTrainer.py.

Features:

   - visual editor of box file
   - layout of symbol from box file reflect symbols on image
   - possibility to define bold, italic, and underline font
   - deleting, joining, splitting of symbols/boxes
   - easy and exact way of adjusting boxes
   - support for opening different image formats (tiff, png, jpeg, bmp, gif)
   - multi-platform support (tested on Linux 64 bit and Windows XP)

Buxfixes (in 1.01):

   - unicode support
   - opening of tesseract v3.00 box file (but save support only v2.0x box
   file)
   - identify/imagick is not need anymore
   - correct error that block to open file on Windows
   - solved issues regarding training symbols @ and $ (used also to identify
   bold and italic font)
   - workaround for missing Numeric support in PyGTK


Because IFAIK nobody react on Catalin e-mail I offered him to create project
to collect patches and possibly to solve known issues. Because of my low
time resource project is looking still for owner/contributors. Warmly
welcomed are expect for python (multi-platform) GUI (GTK/QT/wx...)
 because performance issues - on Windows XP (2GB memory) script crash or
freezes during opening file with a lot of boxes/symbols (e.g.
eng.arial.g4.tif), on Mandrivalinux 2010.164 bit (6GB memory) it take to
open&display 15 minutes!

BR,

Zd.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Announcement: new version of pyTesseractTrainer available

2010-08-21 Thread zdenko podobny
Hi,

your problem is that you use tesseractTrainer.py that was done in 2007 and
not pyTesseractTrainer.py (2010) that corrected this issue. I would suggest
to use
http://code.google.com/p/pytesseracttrainer/downloads/detail?name=pyTesseractTrainer-1.01.pyor
(if you are brave enough devel version:
http://pytesseracttrainer.googlecode.com/svn/trunk/pyTesseractTrainer.py).
In these case you do not need to solve problems that was solved already.

Anyway issues regarding tesseractTrainer.py/pyTesseractTrainer.py please
post to http://code.google.com/p/pytesseracttrainer/issues/list or
pytesseracttrainer-us...@googlegroups.com

BR,

Zd.

On Sat, Aug 21, 2010 at 10:39 AM, tt  wrote:

> This Trainer variant won't open v3 box file:
> Traceback (most recent call last):
>   File "/home/ty/files/tesseractTrainer.py", line 546, in doFileOpen
>self.loadImageAndBoxes(fileName, chooser)
>  File "/home/ty/files/tesseractTrainer.py", line 471, in
> loadImageAndBoxes
>self.boxes = loadBoxData(boxName, height)
>  File "/home/ty/files/tesseractTrainer.py", line 129, in loadBoxData
>(text, left, bottom, right, top) = line.split()
> ValueError: too many values to unpack
>
> It needs something like this diff to proceed (I made this recently for
> own use, and I didn't care about 6th field semantics, yet):
>
> --- tesseractTrainer.py.prev___^2009-04-07 12:18:08.0 +0300
> +++ tesseractTrainer.py^2010-08-17 12:05:31.0 +0300
> @@ -60,6 +60,7 @@
> right = 0
> top = 0
> bottom = 0
> +something = 0
> bold = False
> italic = False
> underline = False
> @@ -126,7 +127,8 @@
> prevRight = -1
> .
> for line in f:
> -(text, left, bottom, right, top) = line.split()
> +#print "%s\n" % (line)
> +(text, left, bottom, right, top, something) = line.split()
> s = Symbol()
> .
> if (text.startswith('@')):
> @@ -589,9 +596,9 @@
> if s.bold:
> text = '@' + text
> #endif
> -f.write('%s %d %d %d %d\n' %
> +f.write('%s %d %d %d %d %d\n' %
> (text, s.left, height - s.bottom, s.right,
> - height - s.top))
> + height - s.top, s.something))
> #endfor
> #endfor
> f.close()
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Which revision of tesseract 3.0 for win7 64bit

2010-08-23 Thread zdenko podobny
On Thu, Aug 19, 2010 at 11:45 PM, Max  wrote:

>
> On Aug 19, 11:49 am, "Jimmy O'Regan"  wrote:
> > On 19 August 2010 11:23, Joe Degenhardt 
> wrote:
> >
> > No, that's the state of things.
> >
>
> hmm... The latest code compiles and works for me  :).  May be I should
> have mentioned that only the release mode is compilable and runnable
> "out of the box", since leptonlibd.dll (debug version of leptonica) is
> missing on svn.
>
> >The latest revision in
> >the svn can be compiled and does not crash so far but stops with an
> >error message which has already been reported in issue 345.
>
> To workaround issue 345 you can revert change #r448 .
>
> max
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>
There is new version (1.66) of leptonica [1] including precompiled windows
libraries. I copied
lib/*.lib from leptonica-1.66-win32-lib-include-dirs.zip to tesseract\lib
 and lib/*.dll to tesseract\ directory..

Than I was able to build Debug and Release version tesseract and
these commands work :-):

tesseract.exe phototest.tif phototest
tesseract.exe phototest.tif phototest batch.nochop makebox
tesseract.exe phototest.tif phototest nobatch box.train

This is first time when I was able to compile tesseract on windows and it
produce output :-). I have almost no experience with compiling sw on Windows
so it would be great is somebody can check this or provide better process.

[1] http://code.google.com/p/leptonica/downloads/list

BR,

Zd.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Which revision of tesseract 3.0 for win7 64bit

2010-08-23 Thread zdenko podobny
On Mon, Aug 23, 2010 at 1:19 PM, zdenko podobny  wrote:

> On Thu, Aug 19, 2010 at 11:45 PM, Max  wrote:
>
>>
>> On Aug 19, 11:49 am, "Jimmy O'Regan"  wrote:
>> > On 19 August 2010 11:23, Joe Degenhardt 
>> wrote:
>> >
>> > No, that's the state of things.
>> >
>>
>> hmm... The latest code compiles and works for me  :).  May be I should
>> have mentioned that only the release mode is compilable and runnable
>> "out of the box", since leptonlibd.dll (debug version of leptonica) is
>> missing on svn.
>>
>> >The latest revision in
>> >the svn can be compiled and does not crash so far but stops with an
>> >error message which has already been reported in issue 345.
>>
>> To workaround issue 345 you can revert change #r448 .
>>
>> max
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
> There is new version (1.66) of leptonica [1] including precompiled windows
> libraries. I copied
> lib/*.lib from leptonica-1.66-win32-lib-include-dirs.zip to tesseract\lib
>  and lib/*.dll to tesseract\ directory..
>
> Than I was able to build Debug and Release version tesseract and
> these commands work :-):
>
> tesseract.exe phototest.tif phototest
> tesseract.exe phototest.tif phototest batch.nochop makebox
> tesseract.exe phototest.tif phototest nobatch box.train
>
> This is first time when I was able to compile tesseract on windows and it
> produce output :-). I have almost no experience with compiling sw on Windows
> so it would be great is somebody can check this or provide better process.
>
> [1] http://code.google.com/p/leptonica/downloads/list
>
> Just remake: I compiled this way tesseract r454 in Visucal C++ 2008 Express
Edition on Windows XP SP3.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Which revision of tesseract 3.0 for win7 64bit

2010-08-26 Thread zdenko podobny
if somebody want to play/test I created zip package for win32:

http://www.sk-spell.sk.cx/file_download/89/tesseract-ocr-r454-en-win32.zip
http://www.sk-spell.sk.cx/windows-build-of-recent-tesseract-code-revision-454

Zd.

On Tue, Aug 24, 2010 at 1:03 AM, Quan Nguyen  wrote:

> I am able to confirm Tesseract r454 with new Leptonica-1.66 binary ran
> w/o the problem that was reported in Issue 304. Well, with one little
> other problem, though:
>
> Could not open file, ./tessdata/eng.user-words
>
> I had to create an empty file with the name to get it to run. When I
> tried with -l vie, it again put out another error:
>
> Could not open file, ./tessdata/vie.user-words
>
> The program should be able to continue w/o any *.user-words files.
>
> Thanks.
>
> On Aug 23, 6:19 am, zdenko podobny  wrote:
> >
> > There is new version (1.66) of leptonica [1] including precompiled
> windows
> > libraries. I copied
> > lib/*.lib from leptonica-1.66-win32-lib-include-dirs.zip to tesseract\lib
> >  and lib/*.dll to tesseract\ directory..
> >
> > Than I was able to build Debug and Release version tesseract and
> > these commands work :-):
> >
> > tesseract.exe phototest.tif phototest
> > tesseract.exe phototest.tif phototest batch.nochop makebox
> > tesseract.exe phototest.tif phototest nobatch box.train
> >
> > This is first time when I was able to compile tesseract on windows and it
> > produce output :-). I have almost no experience with compiling sw on
> Windows
> > so it would be great is somebody can check this or provide better
> process.
> >
> > [1]http://code.google.com/p/leptonica/downloads/list
> >
> > BR,
> >
> > Zd.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract Training Problem (under Mac)

2010-09-05 Thread zdenko podobny
Hello,

Tesseract 2.04 do not use "combined" file, so there is no combine_tessdata.
Just copy your files to tessdata directory.

At the moment http://code.google.com/p/tesseract-ocr/wiki/TestingTesseract
describe
training for Tesseract 3.0 (with mistakes ;-) - I started to check it so
soon there will be correct version). If you want to see description
for Tesseract 2.04 look at svn repository
http://code.google.com/p/tesseract-ocr/source/browse/wiki/TrainingTesseract.wiki?r=318.
It is in wiki syntax but it is easy readable.

BR,

Zd.

On Sat, Sep 4, 2010 at 5:15 AM, John Smith <4ever...@gmail.com> wrote:

> Hi,
>
> Thank you so much for the reply.
> I just have one more step to make, I am using Tesseract 2.04 now and I've
> got all the files ready, I am trying to combine them all together but there
> is no combine_tessdata for 2.04, I want to know how to combine them under
> 2.04.
>
> Thank you so much!!
>
>
> On Sun, Aug 29, 2010 at 8:30 PM, Jimmy O'Regan  wrote:
>
>> On 28 August 2010 07:45, OCR Newbie <4ever...@gmail.com> wrote:
>> > Hi All,
>> >
>> > Currently I am trying to use Tesseract(2.04) to recognize my own data,
>> > with Mac OS X Snow Leopard.
>> > I find this
>> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
>> > and I am trying to follow this tutorial.
>> > My questions are:
>> > 1. I already have my train.tif ready, but I am not sure where I should
>> > place the image file, (under 'tessdata' folder or can be anywhere?
>>
>> If you're running 'tesseract train.tif ...', it just needs to be in
>> the current directory.
>>
>> > 2.About run the tesseract on my training image, it asks to run
>> > 'tesseract train.tif train batch.nochop makebox' , I guess I should
>> > use the terminal, but when I type this command into it, it keep saying
>> > 'tesseract command not found', I tried to run the configure terminal
>> > first and type 'make', but it is still not working.
>>
>> You also need to use 'make install', or provide a path to the
>> executable - Unix-like systems (unlike DOS, etc.) do not include the
>> current directory in the executable search path. (You can, of course,
>> change that but it's A Bad Idea.)
>>
>> If tesseract is in /home/jim and $PWD (use 'echo $PWD') is /home/jim I
>> could use:
>> ./tesseract ...
>> ('.' means 'this directory')
>> /home/jim/tesseract
>> (the full path)
>> or even
>> ../jim/tesseract
>> ('..' means 'one level lower' - in this case, '/home')
>> or even:
>> $PWD/tesseract
>>
>> ($PWD is an environment variable, and will always be there... unless
>> you remove it from another shell, but you probably don't need to worry
>> about that).
>>
>> I think MacOS uses /User or something else, just substitute with
>> actual values. Using 'make install' will be more convenient, though.
>> --
>>  jimregan, that's because deep inside you, you are evil.
>>  Also not-so-deep inside you.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Alternatives to recompiling with libtiff?

2010-09-16 Thread zdenko podobny
Hi,

for conversion I use ImageMagick (
http://www.imagemagick.org/script/index.php) There is tool "convert"
with option  -compress.

If you need tool with gui you can also use IrfanView on Windows. During
saving of image you select option "Show option dialog" and than you can
choose compression type for tif.

BTW: Tesseract 3 will/has no problem (for me ;-) ) to OCR also other format
(png, jpg).

Zd.

On Thu, Sep 16, 2010 at 9:12 PM, kevinlcarlson wrote:

>
> It appears that Tesseract is only able to scan TIFF files (not well
> documented...).  Is there a site with compiled Win binaries
> incorporating libtiff to support compressed TIFF files?
>
> Alternatively, is there a command-line utility to convert compressed
> black and white TIFF files to uncompressed TIFF so they can be scanned
> with the existing Win binaries?  It appears that Ghostscript cannot
> create uncompressed black and white TIFF files.
>
> Thanks,
> Kevin
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: FAILURE! box overlaps no blobs or blobs in multiple rows

2010-09-28 Thread zdenko podobny
send also box file for that image.

Zd.

On Tue, Sep 28, 2010 at 5:51 PM, Bumbi  wrote:

> Here is the link to the image:
>
> http://www.sendspace.com/file/wrpke8
>
> Thanks for the help!
>
> On szept. 28, 16:33, "Jimmy O'Regan"  wrote:
> > On 28 September 2010 13:11, Bumbi  wrote:
> >
> > > If I upload the tif file somebody can tell me, whats wrong with it?
> >
> > I think that would be best.
> >
> > --
> >  jimregan, that's because deep inside you, you are evil.
> >  Also not-so-deep inside you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: FAILURE! box overlaps no blobs or blobs in multiple rows

2010-09-28 Thread zdenko podobny
Base on my experience: it is very difficult to train on scaned images that
you do not "create" (e.g. where you have no possibility to adjusts spaces as
described in
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images
 )

In your case I would try to find similar font as on image and than I would
create training images with enough spaces to avoid ""FAILURE! box overlaps".

Zd.

On Tue, Sep 28, 2010 at 7:43 PM, Bumbi  wrote:

> And what can I do?
> Otherwise if I resize the oroginal training image with Lanczos to 200%
> it works. If I resize with Mitchell to 200% it get an error message.
>
> On szept. 28, 18:43, "Jimmy O'Regan"  wrote:
> > On 28 September 2010 17:13, zdenko podobny  wrote:
> >
> > > send also box file for that image.
> >
> > No need. There's a known issue regarding spacing in training images,
> > and given how small the characters are and how close they are to each
> > other, I'm 99% sure that's what's happening here.
> >
> > --
> >  jimregan, that's because deep inside you, you are evil.
> >  Also not-so-deep inside you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Help on training tesseract for new language

2010-09-29 Thread zdenko podobny
First of all - alway specify version you use. Based on error I guess it it
3.00 (prereleae).
On linux I do not need specified TESSDATA_PREFIX (unless you want to use
other than standard tessdata directory).  I expect that you set
wrong TESSDATA_PREFIX.

Zd.


On Wed, Sep 29, 2010 at 8:18 AM, TesseractNoob wrote:

> Thanks for your reply. Ok I think I compiled both tesseract 2.04 and
> tesseract 3 pre release in the same machine. So that may caused the
> issue. Now I cleaned both versions of builds using make cleandist.
> After that ran the ./configure, make and make install commands. The
> tess data prefix was exported as export TESSDATA_PREFIX=/usr/local/
> share/.
>
>
> So I tried to run the make box command. I am building this in a mac
> machine. It resultant in eng.traineddata not found message. So how
> come this happens if its not related to the 2.04 version? So I tried
> copying a eng.traineddata file to tessdata folder and that error was
> seems fixed. Then I ran the box.train command to make the .tr file. I
> am running this command from the tessdata/configs directory. This gave
> a result of variable not found: display_text. Can any one help me to
> fix this issue? Also I am not sure whether I am following the right
> procedure. Please confirm whether I am correct.
>
> Please guide me.
>
> Thank you.
>
>
>
> On Sep 29, 3:54 am, "Jimmy O'Regan"  wrote:
> > On 28 September 2010 19:44, TesseractNoob  wrote:
> >
> > > Thanks for your reply. So how I am supposed to make the .traineddata
> > > file in old version?
> >
> > You don't. Tesseract 2.* uses individual data files.
> >
> > --
> >  jimregan, that's because deep inside you, you are evil.
> >  Also not-so-deep inside you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Help on training tesseract for new language

2010-10-01 Thread zdenko podobny
You wrote you compiled it. So I expect you know which version you compiled.

Anyway in version 3.00 on linux (not on windows) you can use "tesseract -v"
to see version.

Zd.

On Thu, Sep 30, 2010 at 6:05 AM, TesseractNoob wrote:

> Thank you for your reply.
>
> How do I ensure that I am running tesseract in 2.04 version?
>
> On Sep 29, 4:07 pm, zdenko podobny  wrote:
> > First of all - alway specify version you use. Based on error I guess it
> it
> > 3.00 (prereleae).
> > On linux I do not need specified TESSDATA_PREFIX (unless you want to use
> > other than standard tessdata directory).  I expect that you set
> > wrong TESSDATA_PREFIX.
> >
> > Zd.
> >
> > On Wed, Sep 29, 2010 at 8:18 AM, TesseractNoob  >wrote:
> >
> > > Thanks for your reply. Ok I think I compiled both tesseract 2.04 and
> > > tesseract 3 pre release in the same machine. So that may caused the
> > > issue. Now I cleaned both versions of builds using make cleandist.
> > > After that ran the ./configure, make and make install commands. The
> > > tess data prefix was exported as export TESSDATA_PREFIX=/usr/local/
> > > share/.
> >
> > > So I tried to run the make box command. I am building this in a mac
> > > machine. It resultant in eng.traineddata not found message. So how
> > > come this happens if its not related to the 2.04 version? So I tried
> > > copying a eng.traineddata file to tessdata folder and that error was
> > > seems fixed. Then I ran the box.train command to make the .tr file. I
> > > am running this command from the tessdata/configs directory. This gave
> > > a result of variable not found: display_text. Can any one help me to
> > > fix this issue? Also I am not sure whether I am following the right
> > > procedure. Please confirm whether I am correct.
> >
> > > Please guide me.
> >
> > > Thank you.
> >
> > > On Sep 29, 3:54 am, "Jimmy O'Regan"  wrote:
> > > > On 28 September 2010 19:44, TesseractNoob 
> wrote:
> >
> > > > > Thanks for your reply. So how I am supposed to make the
> .traineddata
> > > > > file in old version?
> >
> > > > You don't. Tesseract 2.* uses individual data files.
> >
> > > > --
> > > >  jimregan, that's because deep inside you, you are evil.
> > > >  Also not-so-deep inside you.
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "tesseract-ocr" group.
> > > To post to this group, send email to tesseract-...@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > tesseract-ocr+unsubscr...@googlegroups.com
> 
> >
> > > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract 3.00 Released

2010-10-01 Thread zdenko podobny
On Fri, Oct 1, 2010 at 3:21 AM, Jimmy O'Regan  wrote:

> Tesseract release notes Sep 30 2010 - V3.00
>  * Preparations for thread safety:
> * Changed TessBaseAPI methods to be non-static
> * Created a class hierarchy for the directories to hold instance data,
>   and began moving code into the classes.
> * Moved thresholding code to a separate class.
>  * Added major new page layout analysis module.
>  * Added HOCR output.
>  * Added Leptonica as main image I/O and handling. Currently optional,
>but in future releases linking with Leptonica will be mandatory.
>  * Ambiguity table rewritten to allow definite replacements in place
>of fix_quotes.
>  * Added TessdataManager to combine data files into a single file.
>  * Some dead code deleted.
>  * VC++6 no longer supported. It can't cope with the use of templates.
>  * Many more languages added.
>  * Doxygenation of most of the function header comments.
>
> As well as a number of new languages, bugfixes, and man pages.
>
> Windows binaries will follow shortly.
>
>
> Windows binaries are now available on download page [1]. Do not forget to
download language file.

Zd.


[1] http://code.google.com/p/tesseract-ocr/downloads/list

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract 3.00 Released

2010-10-02 Thread zdenko podobny
On Sat, Oct 2, 2010 at 5:22 AM, Sriranga(77yrsold)
wrote:

> Zdenko,
>
> Downloaded  windows binaries and works fine in WinXP  Congratulations!!!.
> It would have nice if you had
> included  relevant source code like tesseact.sln for VS2008C++ etc for
> windows platform also.
>
> Did you try to look at
http://tesseract-ocr.googlecode.com/files/tesseract-3.00.tar.gz? (it has
label "Tesseract 3.00 source" ;-) )

If you have problem with it, than try svn and revision r498.

Zd.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract 3.00 Released

2010-10-03 Thread zdenko podobny
create issue (http://code.google.com/p/tesseract-ocr/issues/list) but first:
1) do svn checkout (not svn update - I have experience from last week that
'svn update' could be tricky in same cases on windows but these problems
could depend on user experience ;-) )
2) be sure your VC++ installation is correct.

Zd.

On Sun, Oct 3, 2010 at 7:44 AM, Sriranga(77yrsold)
wrote:

> Tried with svn r-498  but same problem  tested in VC2008 c++ but failed
> with error message
> Best Regards,
> -sriranga(78yrsold)
>
>
> On Sat, Oct 2, 2010 at 3:45 PM, Sriranga(77yrsold) <
> withblessi...@gmail.com> wrote:
>
>> Zdenko,
>> tried to download from the website as suggested and tried to compile in
>> VS2008 but error message disaplayed.
>> this is brought to your kind notice. . i shall try svn r-498.
>> BestRegards,
>> -sriranga(78yrsold)
>>
>>
>> On Sat, Oct 2, 2010 at 2:18 PM, zdenko podobny  wrote:
>>
>>>
>>>
>>> On Sat, Oct 2, 2010 at 5:22 AM, Sriranga(77yrsold) <
>>> withblessi...@gmail.com> wrote:
>>>
>>>> Zdenko,
>>>>
>>>> Downloaded  windows binaries and works fine in WinXP
>>>> Congratulations!!!. It would have nice if you had
>>>> included  relevant source code like tesseact.sln for VS2008C++ etc for
>>>> windows platform also.
>>>>
>>>> Did you try to look at
>>> http://tesseract-ocr.googlecode.com/files/tesseract-3.00.tar.gz? (it has
>>> label "Tesseract 3.00 source" ;-) )
>>>
>>> If you have problem with it, than try svn and revision r498.
>>>
>>> Zd.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "tesseract-ocr" group.
>>> To post to this group, send email to tesseract-...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> tesseract-ocr+unsubscr...@googlegroups.com
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-...@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract 3.00 Released

2010-10-04 Thread zdenko podobny
On Tue, Oct 5, 2010 at 12:36 AM, Malky  wrote:

> I've compiled tesseract (and it works) but I don't know how to use the
> language files from here:
> https://code.google.com/p/tesseract-ocr/downloads/list
>
> I've unpacked language files into /usr/local/share/tessdata/ but I get
> the error message "Error openning data file /usr/local/share/tessdata/
> english.traineddata" (or any other language) if I use the -l option
> even for english. I've tried different language files and the message
> was the same (of course, different names). If I do not choose the -l
> option it works (as Engish). So how can I choose the languages indeed?
>
>
It looks like problem with paths. Can you please post result of  'which
tesseract'?

Zd.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract 3.00 Released

2010-10-05 Thread zdenko podobny
On Tue, Oct 5, 2010 at 10:17 AM, Jimmy O'Regan  wrote:

> On 5 October 2010 07:45, zdenko podobny  wrote:
> >
> > On Tue, Oct 5, 2010 at 12:36 AM, Malky  wrote:
> >>
> >> I've compiled tesseract (and it works) but I don't know how to use the
> >> language files from here:
> >> https://code.google.com/p/tesseract-ocr/downloads/list
> >>
> >> I've unpacked language files into /usr/local/share/tessdata/ but I get
> >> the error message "Error openning data file /usr/local/share/tessdata/
> >> english.traineddata" (or any other language) if I use the -l option
> >> even for english. I've tried different language files and the message
> >> was the same (of course, different names). If I do not choose the -l
> >> option it works (as Engish). So how can I choose the languages indeed?
> >>
> >
> > It looks like problem with paths. Can you please post result of  'which
> > tesseract'?
>
> The traineddata files are gzipped: did you uncompress them? (gzip -d)
>
>
> it is solved. Problem was in using "-l english" instead of  "-l eng".
I already got other feedback that user documentation should be improved

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Released Windows installer (tesseract-ocr-setup-3.00.exe)

2010-10-12 Thread zdenko podobny
Windows installer for Tesseract-OCR 3.00 was released (
tesseract-ocr-setup-3.00.exe
).

Features:
- detection of installed Tesseract-OCR). Tesseract must be installed via
installer
- English language data are included
- Option to download other language data in installer
- Installer adapt PATH environment of current user (e.g. user that installed
tesseract)
- Installer setup TESSDATA_PREFIX environment variable for current user

Proposals for Tesseract graphics (logo, artwork) are welcomed.

This installer should be replacement for tesseract-3.00.win32.zip.
Also I would like to point part of README (
http://code.google.com/p/tesseract-ocr/wiki/ReadMe#General):

*The dll isn't properly working either. The BaseAPI is equipped with a
dllexport for Windows. I strongly recommend all new dll use to go through
the BaseAPI where possible, as this is most likely to keep working in future
versions as we move towards thread-safety.*

*
*
Unfortunately relevant files (tessdll.dll, tessdll.lib, dlltest.exe) were
included to tesseract-3.00.win32.zip package.

Credits:
Sergey Bronnikov - thanks for creating installer script and tests.

Best regards,

Zdenko

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: 3.01 code

2010-11-28 Thread zdenko podobny
Just notice - if somebody did not recognize it yet:

in svn (http://code.google.com/p/tesseract-ocr/source/checkout revision 527)
there is 3.01 code that was build successfully on linux (Mandrivalinux
cooker 64bit) and Windows (XP SP3, VC++2008 Express). There is info about
additional 3.01 code coming from Ray in (near) future.

So please try it on other platforms/systems and report problem/submit
patches in http://code.google.com/p/tesseract-ocr/issues/list in Issue).

If you are willing to create C wrapper (see
http://code.google.com/p/tesseract-ocr/issues/detail?id=386
, 
http://code.google.com/p/tesseract-ocr/issues/detail?id=362
,
http://groups.google.com/group/tesseract-dev/browse_thread/thread/a348e5a6dbade5d7)
this could be good time for first version ;-) so it can become part of 3.01
final code.

Zd.

On Fri, Oct 1, 2010 at 11:26 PM, Jimmy O'Regan  wrote:

> I've put the code of 3.01 on GitHub -
> http://github.com/jimregan/tesseract-ocr. I'd intended to push the
> merge into SVN today, but stupidly used the http address when building
> the git repository instead of the https address, so I can't push back
> directly without rewriting the references. That might be for the best
> though, as I think it might be worth leaving 3.00 as is for a week or
> two, before pushing out the update.
>
> There might still be a few glitches in the build system.
>
> --
>  jimregan, that's because deep inside you, you are evil.
>  Also not-so-deep inside you.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: 3.01 code

2010-11-30 Thread zdenko podobny
Windows build should be fixed in r543. I did not recognize any problem
regarding leptonica (but I just run few OCR on my test images) ;-)

Zd.

On Tue, Nov 30, 2010 at 2:37 AM, Ray Smith  wrote:

> My merge with the latest Google code is now complete and committed.
>
> The svn autotools are currently horribly broken for me. (Using make dist
> and then trying to build from the tar.gz distribution) I had to make the
> following patches in order for it to build, but when it did, it worked:
>
> * make whines about missing .Plo files and missing .Po files in .libs/* I
> had to copy them from my earlier version of 3.01, in which they were all
> created by make. I suspect this is a problem with the gettext system. I
> built my makefiles with no options to runautoconf and configure on linux
> Lucid.
>
> * libtool does not exist in the default distribution. I copied that from my
> earlier version of 3.01.
>
> This version will not compile with any known version of leptonica! Only
> 1.67 and above are compatible at the source level, but the distribution of
> 1.67 builds a .so.0 which tesseract fails to find, even after removing the
> apt-get version of 1.64. leptonica 1.68 will be out soon to fix this
> problem, but in the mean time, I am uploading a .deb package of liblept 1.67
> that outputs .so.1. To fix this temporarily there is a couple of debian
> packages in a debian directory that can be used to build on 64 bit linux
> systems. I may fix this better tomorrow by removing the dependency on the
> function that needs 1.67. This probably also breaks the Windows build.
>
> Ray.
>
>
>
> On Sun, Nov 28, 2010 at 4:21 AM, zdenko podobny  wrote:
>
>> Just notice - if somebody did not recognize it yet:
>>
>> in svn (http://code.google.com/p/tesseract-ocr/source/checkout revision
>> 527) there is 3.01 code that was build successfully on linux (Mandrivalinux
>> cooker 64bit) and Windows (XP SP3, VC++2008 Express). There is info about
>> additional 3.01 code coming from Ray in (near) future.
>>
>> So please try it on other platforms/systems and report problem/submit
>> patches in http://code.google.com/p/tesseract-ocr/issues/list in Issue).
>>
>> If you are willing to create C wrapper (see
>> http://code.google.com/p/tesseract-ocr/issues/detail?id=386<http://code.google.com/p/tesseract-ocr/issues/detail?id=386&sort=-id>
>> , 
>> http://code.google.com/p/tesseract-ocr/issues/detail?id=362<http://code.google.com/p/tesseract-ocr/issues/detail?id=362&sort=-id>
>> ,
>> http://groups.google.com/group/tesseract-dev/browse_thread/thread/a348e5a6dbade5d7)
>> this could be good time for first version ;-) so it can become part of 3.01
>> final code.
>>
>> Zd.
>>
>> On Fri, Oct 1, 2010 at 11:26 PM, Jimmy O'Regan  wrote:
>>
>>> I've put the code of 3.01 on GitHub -
>>> http://github.com/jimregan/tesseract-ocr. I'd intended to push the
>>> merge into SVN today, but stupidly used the http address when building
>>> the git repository instead of the https address, so I can't push back
>>> directly without rewriting the references. That might be for the best
>>> though, as I think it might be worth leaving 3.00 as is for a week or
>>> two, before pushing out the update.
>>>
>>> There might still be a few glitches in the build system.
>>>
>>> --
>>>  jimregan, that's because deep inside you, you are evil.
>>>  Also not-so-deep inside you.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: First use of tesseract

2011-01-27 Thread zdenko podobny
On Thu, Jan 27, 2011 at 7:21 PM, Grimble  wrote:

> Mandriva 2010.2
> Compiled tesseract 3.0 and Leptonlib-1.67, and moved eng.traineddata to
> /usr/local/share/tessdata. Scanned one sheet with xsane to create out.tiff
> When I ran tesseract, I get
> [graeme@mozart ~]$ tesseract out.tiff nci.txt
> Tesseract Open Source OCR Engine with LibTiff
>
This means that tesseract is not compilled against leptonica... Check you
instalation.


> Image file out.tiff cannot be opened!
>

This usually indicated that there is no out.tiff ;-)

Segmentation fault
>
> Is there a solution for this?
>

Yes, check you instalation. Did you install also tesseract from distribution
(AFAIK Mandriva ships tesseract 2.04)? Try to run tesseract  including path
to avoid running wrong version.


> Is tesseract trained for english at installation?


Trainned data are in laguage files (e.g.  eng.traineddata in case of
tesseract 3.00)

If not,,what is the precise command to perform the training?
>

If you want to train language by youself, please read project wiki pages.


> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: what am i missing? tesseract runs but no output

2011-02-18 Thread zdenko podobny
Hi,

Just a quick reply:
I tried it on Windows XP with tesseract 3.00 and it produced bad result
(nothing usefull).

InfranView informations dialog showed that image has resolution 72x72 DPI ->
to low...
So I resampled  it (with Lanczos algorithm) from 100% to 300% size, set DPI
to 300 and decreased number of color to 16 (in InfranView because I have no
time to play with ImageMagick's options ;-) )...
Than OCR result was much more better with several mistakes (just quick
check)...

So with  several image improvements  you can get good OCR result.

BR,

Zd.

On Fri, Feb 18, 2011 at 3:53 PM, Bob Kuo  wrote:

> Hello all
>
> Please forgive the newbie question. I've seen this posted several
> times before, and I thought I had the right solution but apparently
> not.  Attached is a PNG that I'd like to run through tesseract.  I
> used ImageMagick's convert to change it into a tiff:
>
> convert -density 200 -units PixelsPerInch test_page.png -type
> Grayscale +compress test_input.tif
>
> (I've also tried to do this at -density 300 with the same results)
>
> The resulting TIF is attached.  When I run it through tesseract I get
> an output file that is one byte and is basically blank.  Command and
> output below.
>
> tesseract test_input.tif output -l eng
> Tesseract Open Source OCR Engine
> Image has 8 * 1 bits per pixel, and size (375,350)
> Resolution=200
>
> I saw some other threads about a similar problem, but the solutions
> were to scale it to 200 or 300 DPI, make sure it was in grayscale,
> remove the alpha layer, and somewhere else it said it was fixed in
> Tesseract 2.04.  I'm using Tesseract 2.04 on Mac OS X 10.6.6 and
> ImageMagick 6.6.7-1.  Is my image just unsuitable for OCR-ing?
>
> I appreciate any help.
>
> Thanks,
>
> Bob
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: VietOCR v2.0/3.1 & VietOCR.NET v2.0 Releases

2011-02-21 Thread zdenko podobny
Hello,

can you please post a link, where I can find "speedy-ocr bash script"

Zd.

On Tue, Feb 8, 2011 at 10:06 AM, SpeedyChair  wrote:

>   Another way to prepare a PDF document for tesseract is to use the
> 'convert' command from the ImageMagick package to split an image only PDF
> file into a series of GrayScale TIFF images, one for each page.  This
> convert command can work on just about any image.  For PDF conversions, it
> actually makes ghostscript do all of the work.  This same syntax also works
> with multi-page TIFF files and Postscript files.
>
> convert mydoc.pdf -type GrayScale -depth 8 -scene 1 mydoc-%03d.tif
>
> Then you would need to loop through the TIFF files to perform OCR on each
> page image.  In a day or two, I will update my speedy-ocr bash script, which
> will now handle PDF image files.
>


> Don Marang
> Vinux Software Coordinator - vinux.org.uk
>
> There is just so much stuff in the world that, to me, is devoid of any real
> substance, value, and content that I just try to make sure that I am working
> on things that matter.
> Dean Kamen
>
>  *From:* KHEM Sochenda 
> *Sent:* Monday, February 07, 2011 10:23 PM
> *To:* tesseract-ocr@googlegroups.com
> *Subject:* Re: VietOCR v2.0/3.1 & VietOCR.NET v2.0 Releases
>
> Dear Quan,
>
> I would like to know how to let tesseract OCR work with pdf documents.
>
> Thank you very much in advance for you kind response.
>
> With Best Regards,
>
> Sochenda
>
> On Tue, Feb 8, 2011 at 7:56 AM, Quan Nguyen  wrote:
>
>> A Java/.NET GUI frontend for Tesseract OCR engine. The releases
>> include the following fixes and improvements:
>>
>> * Add support for spellcheck suggestion in context menu
>> * Improve program accessibility and usability
>> * Add support for downloading and installing language data packs and
>> appropriate spell dictionaries
>> * Add UI localization for Lithuanian and Slovak
>> * Update Tesseract OCR engine to 3.01 (r551) (v3.1 only)
>>
>> http://vietocr.sf.net
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: [Tesseract 3] English training text

2011-02-22 Thread zdenko podobny
I doubt that google will release their (full) training set :-(

Have a look at svn to file eng.cube.size [1]. You can see there name of
fonts that was training for English in 3.01. As far as I understood there is
(unpublished/not released) possibility to train language data directly on
font files. Unfortunately there are no detail for "cube" part of training.

Zd.

[1] 12,4Mb!
http://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/eng.cube.size

On Wed, Feb 9, 2011 at 5:48 PM, Sly_bzh  wrote:

> I would like to train tesseract for English with some special fonts.
> Tesseract training documentation says that a text should be prepared
> and it must follow some important points (see
>
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images
> )
>
> Could someone provide to the community the content of a good and
> efficient text for english training ?
>
> Note : I think it could be useful to provide the texts that have been
> used to build the training files that could be downloaded in the
> "Download" section (http://code.google.com/p/tesseract-ocr/downloads/
> list). What do you think about that ?
>
> Thanks !
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: [Tesseract 3] English training text

2011-02-22 Thread zdenko podobny
Dmitry,

unfortunately I have not enough of time for tests :-(. I still hope Ray will
release more info before final 3.01. At the moment I focus on box editor.

BR,

Zdenko

On Tue, Feb 22, 2011 at 9:27 AM, Dmitry Silaev wrote:

> Interesting. I was wondering about Cube since its traces began to
> appear in the source code but had no enough time to investigate it
> thorougly
>
> Zdenko, would you please kindly share your other findings on Cube?
>
> Regards,
> Dmitry
>
> On Tue, Feb 22, 2011 at 11:13 AM, zdenko podobny  wrote:
> > I doubt that google will release their (full) training set :-(
> > Have a look at svn to file eng.cube.size [1]. You can see there name of
> > fonts that was training for English in 3.01. As far as I understood there
> is
> > (unpublished/not released) possibility to train language data directly on
> > font files. Unfortunately there are no detail for "cube" part of
> training.
> > Zd.
> > [1] 12,4Mb!
> http://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/eng.cube.size
> > On Wed, Feb 9, 2011 at 5:48 PM, Sly_bzh  wrote:
> >>
> >> I would like to train tesseract for English with some special fonts.
> >> Tesseract training documentation says that a text should be prepared
> >> and it must follow some important points (see
> >>
> >>
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images
> )
> >>
> >> Could someone provide to the community the content of a good and
> >> efficient text for english training ?
> >>
> >> Note : I think it could be useful to provide the texts that have been
> >> used to build the training files that could be downloaded in the
> >> "Download" section (http://code.google.com/p/tesseract-ocr/downloads/
> >> list). What do you think about that ?
> >>
> >> Thanks !
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "tesseract-ocr" group.
> >> To post to this group, send email to tesseract-ocr@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> tesseract-ocr+unsubscr...@googlegroups.com.
> >> For more options, visit this group at
> >> http://groups.google.com/group/tesseract-ocr?hl=en.
> >>
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "tesseract-ocr" group.
> > To post to this group, send email to tesseract-ocr@googlegroups.com.
> > To unsubscribe from this group, send email to
> > tesseract-ocr+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en.
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: pixReadFromTiffStream: failed to read tiffdata

2011-02-25 Thread zdenko podobny
see:
http://code.google.com/p/tesseract-ocr/issues/detail?id=340
http://code.google.com/p/tesseract-ocr/issues/detail?id=391
http://code.google.com/p/tesseract-ocr/issues/detail?id=443

Zdenko

On Fri, Feb 25, 2011 at 9:15 AM, Nicolas Raoul wrote:

> I get the following error on a TIFF created from a PDF by ImageMagick:
>
> tesseract file.tiff ocred -l eng
> Tesseract Open Source OCR Engine with Leptonica
> TIFFstream: Sorry, can not handle image.
> Error in pixReadFromTiffStream: failed to read tiffdata
> Error in pixReadStreamTiff: pix not read
> Error in pixReadTiff: pix not read
>
> TIFF created from a PDF with ImageMagick:
> convert file.pdf -depth 4 file.tiff
>
> TIFF information (truncated):
> tiffinfo file.tiff
> TIFF Directory at offset 0xb792a (751914)
>  Subfile Type: multi-page document (2 = 0x2)
>  Image Width: 595 Image Length: 842
>  Resolution: 72, 72 (unitless)
>  Bits/Sample: 4
>  Compression Scheme: None
>  Photometric Interpretation: RGB color
>  FillOrder: msb-to-lsb
>  Orientation: row 0 top, col 0 lhs
>  Samples/Pixel: 3
>  Rows/Strip: 9
>  Planar Configuration: single image plane
>  Page Number: 0-35
>  DocumentName: scanned.tiff
>  Software: ImageMagick 6.6.2-6 2010-12-02 Q16 http://www.imagemagick.org
> TIFF Directory at offset 0x16f688 (1504904)
>  Subfile Type: multi-page document (2 = 0x2)
>  Image Width: 595 Image Length: 842
> [...]
>
> Tesseract 3.0 compiled from source (leptonica and libtiff installed)
> on Ubuntu 2010.04.
>
> Is there a problem with my TIFF?
> Could someone point me to a TIFF file that is know to work with
> Tesseract?
> Thanks a lot!
>
> Nicolas Raoul
> ECM consultant in Tokyo
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: pixReadFromTiffStream: failed to read tiffdata

2011-02-27 Thread zdenko podobny
Hello Nicolas,

"Error in pixRead*" is error message from leptonica. Tesseract use leptonica
library for opening images. That mean if leptonica can not open file
tesseract can not *use *it.

So far there was ALWAYS problem with instalation of leptonica (if you got
this error). As suggested in these issues - build example leptonica progs
and test you image with them.

Zdenko

On Mon, Feb 28, 2011 at 2:53 AM, Nicolas Raoul wrote:

> Hello Zdenko,
>
> The 3 issues you linked to are easy "function not present" errors.
> I don't get this error at all.
>
> I get "failed to read tiffdata", which returns no results in Google,
> so I believe it is a very new error, that has never been discussed
> before.
>
> Thanks for your fast reply!
> Nicolas Raoul
>
> On Feb 25, 6:57 pm, zdenko podobny  wrote:
> > see:
> http://code.google.com/p/tesseract-ocr/issues/detail?id=340http://code.google.com/p/tesseract-ocr/issues/detail?id=391http://code.google.com/p/tesseract-ocr/issues/detail?id=443
> >
> > Zdenko
> >
> > On Fri, Feb 25, 2011 at 9:15 AM, Nicolas Raoul  >wrote:
> >
> > > I get the following error on a TIFF created from a PDF by ImageMagick:
> >
> > > tesseract file.tiff ocred -l eng
> > > Tesseract Open Source OCR Engine with Leptonica
> > > TIFFstream: Sorry, can not handle image.
> > > Error in pixReadFromTiffStream: failed to read tiffdata
> > > Error in pixReadStreamTiff: pix not read
> > > Error in pixReadTiff: pix not read
> >
> > > TIFF created from a PDF with ImageMagick:
> > > convert file.pdf -depth 4 file.tiff
> >
> > > TIFF information (truncated):
> > > tiffinfo file.tiff
> > > TIFF Directory at offset 0xb792a (751914)
> > >  Subfile Type: multi-page document (2 = 0x2)
> > >  Image Width: 595 Image Length: 842
> > >  Resolution: 72, 72 (unitless)
> > >  Bits/Sample: 4
> > >  Compression Scheme: None
> > >  Photometric Interpretation: RGB color
> > >  FillOrder: msb-to-lsb
> > >  Orientation: row 0 top, col 0 lhs
> > >  Samples/Pixel: 3
> > >  Rows/Strip: 9
> > >  Planar Configuration: single image plane
> > >  Page Number: 0-35
> > >  DocumentName: scanned.tiff
> > >  Software: ImageMagick 6.6.2-6 2010-12-02 Q16
> http://www.imagemagick.org
> > > TIFF Directory at offset 0x16f688 (1504904)
> > >  Subfile Type: multi-page document (2 = 0x2)
> > >  Image Width: 595 Image Length: 842
> > > [...]
> >
> > > Tesseract 3.0 compiled from source (leptonica and libtiff installed)
> > > on Ubuntu 2010.04.
> >
> > > Is there a problem with my TIFF?
> > > Could someone point me to a TIFF file that is know to work with
> > > Tesseract?
> > > Thanks a lot!
> >
> > > Nicolas Raoul
> > > ECM consultant in Tokyo
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "tesseract-ocr" group.
> > > To post to this group, send email to tesseract-ocr@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > tesseract-ocr+unsubscr...@googlegroups.com.
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en.
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract 3.00 Released

2011-03-02 Thread zdenko podobny
On Sun, Oct 24, 2010 at 11:58 PM, Jimmy O'Regan  wrote:

> On 20 October 2010 23:15, Jimmy O'Regan  wrote:
> > On 21 October 2010 06:29, Jeffrey Ratcliffe 
> wrote:
> >> Debian requires that each shared library have its own package. At the
> >> moment, that would require the following extra packages:
> >
> > That doesn't sound right. I'll check into it at the mentor summit.
>
> I discussed this with Ray and he prefers having a single library, so
> I'm going to make the multiple libraries a non-default option. This'll
> surface in Tesseract 3.01, but I can certainly provide you with a
> patch for 3.00 when I get a chance to write it up.
>

 I would like to point to one issue related to this: naming. If my findings
is correct this is current situation (please feel free to correct me):

Name of tesseract 2.04 library on Windows: tessdll.dll (tessdll.lib)
Name of tesseract 2.04 library for .Net on Windows: tessnet2_32.dll
(tessnet2_64.dll)
Name of tesseract (2.04?) library on Android: libocr.so (but maybe this is
issue only of https://code.google.com/p/eyes-free, because I think I so
somewhere libtess)
Name of tesseract 3.01 library for .Net on Windows: tesseractengine3.dll
Name of tesseract 3.01 library for python (created with SWIG) on Linux:
_tesseract.so
Name of tesseract 3.01 library on Linux (as mentioned by
Jefferey): libtesseract_api.so.3.0.1, libtesseract_main.so.3.0.1,
libtesseract_ccstruct.so.3.0.1, libtesseract_neural.so.3.0.1,
libtesseract_ccutil.so.3.0.1, libtesseract_tessopt.so.3.0.1,
libtesseract_classify.so.3.0.1, libtesseract_textord.so.3.0.1,
libtesseract_cube.so.3.0.1, libtesseract_training.so.3.0.1,
libtesseract_cutil.so.3.0.1, libtesseract_viewer.so.3.0.1,
libtesseract_dict.so.3.0.1, libtesseract_wordrec.so.3.0.1,
libtesseract_image.so.3.0.1 (libtesseract_api.a,  libtesseract_main.a,
libtesseract_ccstruct.a,  libtesseract_neural.a, libtesseract_ccutil.a,
 libtesseract_pageseg.a, libtesseract_classify.a,  libtesseract_tessopt.a,
libtesseract_cube.a,  libtesseract_textord.a, libtesseract_cutil.a,
 libtesseract_training.a, libtesseract_dict.a,  libtesseract_viewer.a,
libtesseract_full.a,  libtesseract_wordrec.a, libtesseract_image.a)

I believe it would be good if we can agreed on "naming standard".

Zdenko

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: can't read frequent_words_list file

2011-03-04 Thread zdenko podobny
please provide more information: how you try create dictionary, platform,
exact version of Tessaract (maybe how did you get it).

Zdenko

On Fri, Mar 4, 2011 at 2:50 PM, Sang Đặng Minh
wrote:

> hi all. my name is Sang. I'm trying to train Tessaract 2.0, everything
> is ok, but i can't create DAWG files, this error is: Could not open
> file frequent_words_list.
> Please help me!
> thanks a lot!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: What is everything I need for the linux version in English?

2011-03-20 Thread zdenko podobny
Did you try to read wiki (http://code.google.com/p/tesseract-ocr/wiki/)
e.g.  ReadMe?

Zdenko

On Sun, Mar 20, 2011 at 2:20 AM, LAPIII  wrote:

> I read through the list on the Downloads page, but couldn't understand
> everything I needed for an install on Linux.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract compilation on code blocks (gcc + mingw)

2011-03-22 Thread zdenko podobny
Hi,

I tried (as excercise ;-) ) to use cmake (http://cmake.org/) for building
tesseract, because it would enable to use one build system on more
platforms. I have first version (not very sophisticated :-) ) that works on
linux (with gcc) but I failed on windows with mingw. Problem is that minwg
miss at least one function that is used by tesseract - 'strtok_r' (see
http://sourceforge.net/tracker/?func=detail&aid=2673480&group_id=2435&atid=352435).
I have no time to play with porting and testing (or using external
library pthreads-win32) if there is other working solution (VC++2008). cmake
support also Visual Studio 6-10 but I did not try it yet (cmake is not my
priority).

Zdenko

On Mon, Mar 21, 2011 at 4:56 AM, Saurabh Gandhi wrote:

> Hello,
>
> Has anyone tried compiling tesseract successfully on code blocks
> (gcc+mingw). If yes could you post the steps that you followed / changes
> that you made for successful compilation.
>
> Thank you in advance.
>
> --
> Regards,
> Saurabh Gandhi
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: tesseract.exe has stopped working on win2008 r2

2011-03-23 Thread zdenko podobny
Hi,

tesseract is command line tool. Item i windows menu is more or less just for
testing purpose (it will not be present in next version of tesseract
installer).

If you need gui have a look on Vietocr, PDF OCR X, lector etc.

Zdenko

On Wed, Mar 23, 2011 at 4:50 PM, moos3  wrote:

> I have been trying to get the latest version thats available for down
> to work on windows 2008 server r2. The moment it goes to process the
> file instant Has stopped working message on the screen. I was
> wondering if any could build a new windows release or know how to fix
> this issue ?
>
>
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: tesseract.exe has stopped working on win2008 r2

2011-03-26 Thread zdenko podobny
convert it to png - you got smaller picture with the same quality and
tesseract should process it without problem.

Zdenko

On Fri, Mar 25, 2011 at 5:03 PM, Richard Genthner wrote:

> Here is the screenshot and the tif file. Dmitri if you rename the .exe that
> should work. I'm trying to get the traning data up.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: tesseract.exe has stopped working on win2008 r2

2011-03-26 Thread zdenko podobny
On Fri, Mar 25, 2011 at 5:40 PM, Lutz, Michael  wrote:

>  Hi,
>
> I just ran your tif file, I get no results, it must have something to do
> with the size of the image. If I try to run a portion of tiff something
> smaller than 1000x1000 then I get results.
>
> Can somebody explain why a tif size (2480x3508 @ 8BPP) is not processed?
>

This is not tesseract but leptonica issue (library used for image handling).
When I run it on linux I got error message comming from leptonica (1.67 -> I
did not try 1.68 on linux yet):
Error in pixReadFromTiffStream: spp not in set {1,3,4}
Error in pixReadStreamTiff: pix not read
Error in pixReadTiff: pix not read

On Windows leptonica "release version" library did not show error/warning
messages because of compile option "NO_CONSOLE_IO" (see
http://code.google.com/p/leptonica/issues/detail?id=42).

It looks like leptonica did not support lzw compression for tiff ( see
http://www.leptonica.com/source/README.html  "9. Image I/O" - lzw is
mentioned in png and gif section, but not with tif). I change
tif compression from lzw to zip (BTW: this will cause smaller image),
tesseract will produce ouput (on XP SP3).

Zdenko


 Mike
>
>
>
> *Von:* Richard Genthner [mailto:rich...@guthnur.net]
> *Gesendet:* Freitag, 25. März 2011 17:04
> *An:* Lutz, Michael
> *Cc:* tesseract-ocr@googlegroups.com
>
> *Betreff:* Re: tesseract.exe has stopped working on win2008 r2
>
>
>
> Here is the screenshot and the tif file. Dmitri if you rename the .exe that
> should work. I'm trying to get the traning data up.
>
> --
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmas...@nds.com and delete it from your system as well as any copies.
> The content of e-mails as well as traffic data may be monitored by NDS for
> employment and security purposes.
> To protect the environment please do not print this e-mail unless
> necessary.
>
> An NDS Group Limited company. www.nds.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: simple invocation of tesseract on ubuntu generates a single-byte output file

2011-03-26 Thread zdenko podobny
On Sat, Mar 26, 2011 at 2:34 PM, rpjday  wrote:

> long story short, i'm seeing this issue on my ubuntu 10.10 system:
>
>  http://ubuntuforums.org/showthread.php?t=1599686
>
> the packages i have installed:
>
>  * tessearct-ocr
>  * tesseract-ocr-eng
>
> which version you installed?


> i took a simple screenshot of some text, saved it to a .tif file, then
> ran:
>
> screenshot has usually very low DPI (96?). Suggested DPI for OCR is 300.
Have a look on VietOCR (http://sourceforge.net/projects/vietocr/) there is
also "screenshot" mode that try to solve this problem (yes it work also for
other than Vietnamese language :-) ).


>  $ tesseract tess.tif tess
>
> which generated the output file tess.txt, whose content was a single
> byte (the newline character).
>
>  i added the option "-l eng", but that made no difference, and I get
> no diagnostics.  i also checked this out:
>
>  https://help.ubuntu.com/community/OCR
>
> but i didn't see anything that would resolve this issue.  can someone
> verify that tesseract can properly process a trivial .tif file on
> ubuntu?  i just want a working example i can use as a starting point.
> thanks.
>
> rday
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: simple invocation of tesseract on ubuntu generates a single-byte output file

2011-03-26 Thread zdenko podobny
On Sat, Mar 26, 2011 at 3:56 PM, Robert P. J. Day wrote:

> On Sat, 26 Mar 2011, zdenko podobny wrote:
>
> > On Sat, Mar 26, 2011 at 2:34 PM, rpjday  wrote:
> >   long story short, i'm seeing this issue on my ubuntu 10.10 system:
> >
> >http://ubuntuforums.org/showthread.php?t=1599686
> >
> >   the packages i have installed:
> >
> >* tessearct-ocr
> >* tesseract-ocr-eng
> >
> > which version you installed?
>
>   the most recent version for ubuntu 10.10 is 2.04, and i realize
> there's a version 3.  should i manually upgrade?
>
> this is a good idea ;-) please read README wiki before installation.


>  >   i took a simple screenshot of some text, saved it to a .tif file,
> then
> >   ran:
> >
> > screenshot has usually very low DPI (96?). Suggested DPI for OCR is 300.
> Have a look on
> > VietOCR (http://sourceforge.net/projects/vietocr/) there is also
> "screenshot" mode that
> > try to solve this problem (yes it work also for other than Vietnamese
> language :-) ).
>
>   ah, thanks, i'll look into that.
>
> rday
>
> --
>
> 
> Robert P. J. Day   Waterloo, Ontario, CANADA
>http://crashcourse.ca
>
> Twitter:   http://twitter.com/rpjday
> LinkedIn:   http://ca.linkedin.com/in/rpjday
> 
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: tesseract.exe has stopped working on win2008 r2

2011-03-27 Thread zdenko podobny
Tree different users with working tesseract-ocr have problem with this
input. Isn't it sufficient for you? Yes there could be other issues related
to Windows 7 (because official build was done on 32bit Windows XP), but
without solving "input issue" other investigation will
be unnecessary difficult...

Zdenko

On Sat, Mar 26, 2011 at 10:04 PM, Dmitri Silaev wrote:

> Guys, I still can't understand what the error is produced by
> Tesseract. Let's wait for the error screenshot. Or did you understand
> everything already? Richard says he's got an error message...
>
> Warm regards,
> Dmitri Silaev
>
>
>
>
>
> On Sat, Mar 26, 2011 at 5:42 PM, zdenko podobny  wrote:
> >
> >
> > On Fri, Mar 25, 2011 at 5:40 PM, Lutz, Michael  wrote:
> >>
> >> Hi,
> >>
> >> I just ran your tif file, I get no results, it must have something to do
> >> with the size of the image. If I try to run a portion of tiff something
> >> smaller than 1000x1000 then I get results.
> >>
> >> Can somebody explain why a tif size (2480x3508 @ 8BPP) is not processed?
> >
> > This is not tesseract but leptonica issue (library used for image
> handling).
> > When I run it on linux I got error message comming from leptonica (1.67
> -> I
> > did not try 1.68 on linux yet):
> > Error in pixReadFromTiffStream: spp not in set {1,3,4}
> > Error in pixReadStreamTiff: pix not read
> > Error in pixReadTiff: pix not read
> > On Windows leptonica "release version" library did not show error/warning
> > messages because of compile option "NO_CONSOLE_IO"
> > (see http://code.google.com/p/leptonica/issues/detail?id=42).
> > It looks like leptonica did not support lzw compression for tiff (
> > see http://www.leptonica.com/source/README.html  "9. Image I/O" - lzw is
> > mentioned in png and gif section, but not with tif). I change
> > tif compression from lzw to zip (BTW: this will cause smaller image),
> > tesseract will produce ouput (on XP SP3).
> > Zdenko
> >
> >> Mike
> >>
> >>
> >>
> >> Von: Richard Genthner [mailto:rich...@guthnur.net]
> >> Gesendet: Freitag, 25. März 2011 17:04
> >> An: Lutz, Michael
> >> Cc: tesseract-ocr@googlegroups.com
> >>
> >> Betreff: Re: tesseract.exe has stopped working on win2008 r2
> >>
> >>
> >>
> >> Here is the screenshot and the tif file. Dmitri if you rename the .exe
> >> that should work. I'm trying to get the traning data up.
> >>
> >> 
> >> This message is confidential and intended only for the addressee. If you
> >> have received this message in error, please immediately notify the
> >> postmas...@nds.com and delete it from your system as well as any
> copies. The
> >> content of e-mails as well as traffic data may be monitored by NDS for
> >> employment and security purposes.
> >> To protect the environment please do not print this e-mail unless
> >> necessary.
> >>
> >> An NDS Group Limited company. www.nds.com
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "tesseract-ocr" group.
> >> To post to this group, send email to tesseract-ocr@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> tesseract-ocr+unsubscr...@googlegroups.com.
> >> For more options, visit this group at
> >> http://groups.google.com/group/tesseract-ocr?hl=en.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "tesseract-ocr" group.
> > To post to this group, send email to tesseract-ocr@googlegroups.com.
> > To unsubscribe from this group, send email to
> > tesseract-ocr+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en.
> >
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: tesseract.exe has stopped working on win2008 r2

2011-03-27 Thread zdenko podobny
On Sun, Mar 27, 2011 at 12:45 AM, TP  wrote:

> On Sat, Mar 26, 2011 at 7:42 AM, zdenko podobny  wrote:
> >> Can somebody explain why a tif size (2480x3508 @ 8BPP) is not processed?
>
> The test image has 16 bpp.
>
> Interesting. How did get this information? I tried:

   - identify (imagemagick): TIFF 2480x3508 2480x3508+0+0 8-bit Grayscale
   DirectClass 1.556MB
   - infranview "says": Original colors: 65536   (16 BitsPerPixel);
   Current colors: 256   (8 BitsPerPixel); Number of unique colors: 41;


 > This is not tesseract but leptonica issue (library used for image
> handling).
> > When I run it on linux I got error message comming from leptonica (1.67
> -> I
> > did not try 1.68 on linux yet):
> > Error in pixReadFromTiffStream: spp not in set {1,3,4}
> > Error in pixReadStreamTiff: pix not read
> > Error in pixReadTiff: pix not read
>
> I get same warnings on with Leptonica v1.68 on Windows XP SP3.
>
> > On Windows leptonica "release version" library did not show error/warning
> > messages because of compile option "NO_CONSOLE_IO"
> > (see http://code.google.com/p/leptonica/issues/detail?id=42).
> > It looks like leptonica did not support lzw compression for tiff (
> > see http://www.leptonica.com/source/README.html  "9. Image I/O" - lzw is
> > mentioned in png and gif section, but not with tif). I change
> > tif compression from lzw to zip (BTW: this will cause smaller image),
> > tesseract will produce ouput (on XP SP3).
>
> Incorrect. At least on Windows I build libtiff with "LZW_SUPPORT=
> 1"
> in my nmake.opt file.
>
> You can see the actual problem by looking at
> http://tpgit.github.com/Leptonica/tiffio_8c_source.html#l00274, where
> Leptonica gets the TIFFTAG_SAMPLESPERPIXEL. It allows 1, 3, or 4 but
> not 2 as this image contains.
>

Thanks for clarifying this. As I mention It was just my guess based on my
observation of README :-)

>
>  -- TP
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: tesseract.exe has stopped working on win2008 r2

2011-03-28 Thread zdenko podobny
On Mon, Mar 28, 2011 at 11:54 AM, Lutz, Michael  wrote:

> Hi All,
>
> So the image Richard gave us is a compressed TIF file. Since tesseract only
> supports uncompressed TIF images as noticed by Zdenko you will not get any
> results from this image.
>

Incorrect:

   1. image support is task of leptonica, so list of supported format can be
   found of leptonica web and source code. I think we really need to
   distinguish this, because with upgrading of leptonica there could be support
   for new format without changing a line in tesseract code.
   2. I guessed that leptonica has problem with tiff with "lzw compression".
   When I created tiff with "zip compression" it worked (there are also
   other compression algorithms available in tiff: Packbits, G4, G3,...). I
   never said that leptonica (tesseract) support only uncompressed tiff. I am
   sorry if I was not clear about this.
   3. As TP corrected me: problem is not in LZW compression, but in "Samples
   per Pixel". Leptonica support 1, 3, 4. Input image used (unsupported) 2.
   To "solve" this just open input file in InfranView and save it as tiff with
   lzw compression. It will change "Samples/Pixel" to 1 automatically ;-)

 Zdenko

I attached the image as an uncompressed TIF file, see uncompressed.zip, this
> image is processed by tesseract without any problems.
> Also attached is a tesseract.zip, which should unpack a
> tesseract.executable, just rename it to tesseract.exe if it went through, it
> is a release static build using Win7 and WinSDK 7.1 if anyone still wants
> it.
>
> Regards,
> Mike
>
> -Ursprüngliche Nachricht-
> Von: Dmitri Silaev [mailto:daemons2...@gmail.com]
> Gesendet: Samstag, 26. März 2011 22:04
> An: tesseract-ocr@googlegroups.com
> Cc: zdenko podobny; Lutz, Michael; Richard Genthner
> Betreff: Re: tesseract.exe has stopped working on win2008 r2
>
> Guys, I still can't understand what the error is produced by
> Tesseract. Let's wait for the error screenshot. Or did you understand
> everything already? Richard says he's got an error message...
>
> Warm regards,
> Dmitri Silaev
>
>
>
>
>
> On Sat, Mar 26, 2011 at 5:42 PM, zdenko podobny  wrote:
> >
> >
> > On Fri, Mar 25, 2011 at 5:40 PM, Lutz, Michael  wrote:
> >>
> >> Hi,
> >>
> >> I just ran your tif file, I get no results, it must have something to do
> >> with the size of the image. If I try to run a portion of tiff something
> >> smaller than 1000x1000 then I get results.
> >>
> >> Can somebody explain why a tif size (2480x3508 @ 8BPP) is not processed?
> >
> > This is not tesseract but leptonica issue (library used for image
> handling).
> > When I run it on linux I got error message comming from leptonica (1.67
> -> I
> > did not try 1.68 on linux yet):
> > Error in pixReadFromTiffStream: spp not in set {1,3,4}
> > Error in pixReadStreamTiff: pix not read
> > Error in pixReadTiff: pix not read
> > On Windows leptonica "release version" library did not show error/warning
> > messages because of compile option "NO_CONSOLE_IO"
> > (see http://code.google.com/p/leptonica/issues/detail?id=42).
> > It looks like leptonica did not support lzw compression for tiff (
> > see http://www.leptonica.com/source/README.html  "9. Image I/O" - lzw is
> > mentioned in png and gif section, but not with tif). I change
> > tif compression from lzw to zip (BTW: this will cause smaller image),
> > tesseract will produce ouput (on XP SP3).
> > Zdenko
> >
> >> Mike
> >>
> >>
> >>
> >> Von: Richard Genthner [mailto:rich...@guthnur.net]
> >> Gesendet: Freitag, 25. März 2011 17:04
> >> An: Lutz, Michael
> >> Cc: tesseract-ocr@googlegroups.com
> >>
> >> Betreff: Re: tesseract.exe has stopped working on win2008 r2
> >>
> >>
> >>
> >> Here is the screenshot and the tif file. Dmitri if you rename the .exe
> >> that should work. I'm trying to get the traning data up.
> >>
> >> 
> >> This message is confidential and intended only for the addressee. If you
> >> have received this message in error, please immediately notify the
> >> postmas...@nds.com and delete it from your system as well as any
> copies. The
> >> content of e-mails as well as traffic data may be monitored by NDS for
> >> employment and security purposes.
> >> To protect the environment please do not print this e-mail unless
> >> necessary.
> >>
> >> An NDS Group Li

Re: Newbie tesseract training question

2011-03-28 Thread zdenko podobny
Can you provide example image file (TainingMontage.png)?

Zdenko

On Mon, Mar 28, 2011 at 11:12 PM, Robin  wrote:

> Hi,
>
> I'm reasonably new to tesseract and am trying to train it to recognise
> hex characters from a dot matrix LED display.  The characters are
> clear and well spaced, but the box file generation always results in
> "Empty page".
>
> I'm using tesseract 3, installed from tesseract-ocr-setup-3.00.exe.
>
> The command line I'm using is...
>
> tesseract d:\data\TainingMontage.png d:\data\training\led.exp0
> batch.nochop makebox
>
> Changing my trainging image to the eurotxet.tif example provided works
> as documented in the training notes.
>
> I think my trouble lies in the resolution of the individual
> characters.  Each character in the display is a 7 high x 5 wide dot
> matrix.  I have created a training image with a lot of characters.
>
> Any tips?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Problem with Tesseract 3.00

2011-03-30 Thread zdenko podobny
Hi,

unfortunately some fixes regarding windows build was committed after
releasing 3.00 version (=revision 498).

I thought about 3.00.1 release (=revision 525) and as "temporary solution" I
created 3.00.1 tesseract.exe (somebody ask for it). Than I changed my mind
because it looks that developers grab the latest svn version with fixed
issues...

So I suggest you to use svn version (revision 525 or 578 if you want to test
3.01 version).

Zdenko

On Wed, Mar 30, 2011 at 12:16 PM, mohamed amine wrote:

> Hello
>
> I have some problems and many questions and i hope you will have
> answers:
>
> 1) when loading the hole project, "combine_tessadata" did not load
> with the 17 project : is this a problem that causes a problem when
> generating tesseract.exe.
>
> 2) Should i exucute tesseract-3.00.1.exe to have the right
> Tesseract.exe after compiling(I did that).
>
> 3)The Tesseract's source code didn't contain a LIB directory in ..
> \VS2008\, so i had copied
> leptonica-1.66-win32-lib-include-dirs in  ..\VS2008.Is this the manner
> to compile the source code and had i to compile leptonica-1.66 alone
> and what version of leptonica is compatible with Tesseract 3.00(the
> last release version)
>
> 4)When adding leptonica library, png,jpeg,..images will be recognized?
>
> 5) I tried to load the priject with vs2010 but same probleme of
> converting   "combine_tessadata", i tried to add the missing project
> but it don't exist, what can I do?
>
> Thanks for help
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: disable newline in table layout recognition

2011-03-30 Thread zdenko podobny
On Wed, Mar 30, 2011 at 8:55 AM, Max Cantor  wrote:

> I had a similar issue.  I couldn't get the config to work but basically
> added this line to my code and it worked:
>
>api.SetPageSegMode(tesseract::PSM_SINGLE_COLUMN);
>
> For some reason, the tesseract binary doesn't pick up the config, but I
> copied the binary source and added that.
>
> this should be already fixed in svn revision 517 (see
http://code.google.com/p/tesseract-ocr/source/detail?r=517)

Zdenko


> Max
>
>
> On Mar 30, 2011, at 2:31 PM, Patrick Kirsch wrote:
>
> > Hey,
> >
> > I'm sure it has to be a config switch, but I did not found the
> > following situation in the archives, so I'm asking:
> > I would like to disable the automatic newline insert, if a table is
> > recogniced.
> > Let me explain: the layout of the input image is similar to:
> >
> > Header
> > Key:   Value
> > Key2: Value2
> > ...
> >
> > but tesseract results in:
> > Header
> > Key:
> > Key2:
> > Value
> > Value2
> > ...
> >
> > But I would expect:
> > Header
> > Key: Value
> > Key2: Value2
> > ...
> >
> > I'm susing: tesseract 3.00
> >
> > Regards,
> > Patrick
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> > To post to this group, send email to tesseract-ocr@googlegroups.com.
> > To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: tesseract-3.01 compiling issue on linux

2011-04-07 Thread zdenko podobny
Did you tried also fresh svn (e.g. delete old local svn version or download
to another directory)?
What linux distribution you use?

Zdenko

On Thu, Apr 7, 2011 at 11:30 PM, zl2k  wrote:

> I tried but no luck, 3.00 is compilable though.
>
> On Apr 7, 4:05 pm, Zdenko Podobný  wrote:
> > Did you run "./runautoconf ; ./configure"  before running make?
> >
> > I have no problem to compile revision 581 on linux.
> >
> > Zdenko
> >
> > Dn(a 07.04.2011 01:56, zl2k  wrote / nap�sal(a):
> >
> >
> >
> >
> >
> >
> >
> > > hi, all,
> >
> > > I just checked out tesseract-3.01(r581) from svn but got the following
> > > compiling error on linux box
> >
> > > colfind.cpp:449: error: �boxaGetCount� was not declared in this
> scope
> > > colfind.cpp:451: error: �l_int32� was not declared in this scope
> > > colfind.cpp:451: error: expected �;� before �x�
> > > colfind.cpp:452: error: �x� was not declared in this scope
> > > colfind.cpp:452: error: �y� was not declared in this scope
> > > colfind.cpp:452: error: �width� was not declared in this scope
> > > colfind.cpp:452: error: �height� was not declared in this scope
> > > colfind.cpp:452: error: �boxaGetBoxGeometry� was not declared in
> this
> > > scope
> > > colfind.cpp:453: error: �L_CLONE� was not declared in this scope
> > > colfind.cpp:453: error: �pixaGetPix� was not declared in this scope
> > > colfind.cpp:456: error: �pixGetWidth� was not declared in this
> scope
> > > colfind.cpp:494: error: �pixDestroy� was not declared in this scope
> >
> > > Does anyone have a compilable version or if there is any by pass?
> > > Thanks for help.
> >
> > > zl2k
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Tesseract OCR in daemon mode?

2011-04-08 Thread zdenko podobny
On Thu, Apr 7, 2011 at 9:33 PM, Mike Sandford  wrote:

> I don't know if it's strictly necessary for my application, but I am
> trying to analyze anywhere from a few characters up to a few lines of
> text rapidly.  Tesseract is a portion of my application pipeline.
> I've got my own document layout engine since there's a lot of really
> specialized, mostly useless domain knowledge.
>
> OCR is currently taking up over half the total analysis time.  I
> managed to reduce it from about 60% to about 20% by (2sec per document
> to 0.8sec per document) using multiprocessing, I launch several jobs
> from the command line in parallel.  That's roughly 4x speedup on a
> quad-core so that's good.  But I'm still interested in pushing
> further.  In the ideal world I'd have four tesseract daemons on all
> the time and when I need OCR done I pipe a filename in (or perhaps the
> image data) and get a string out.  Or something like that.
>
> My thought is that it takes a certain amount of time to load up the
> binary and the training data and get organized in memory.  Right now
> this whole process happens every time I need to process a file,
> perhaps 10-20 times per document.  That could be a substantial amount
> of overhead.  I fed tesseract a 1x1 white tiff 10 times and it took
> between 30ms and 44ms to load and tell me that there was no output.
> Let's just assume for a moment that those numbers aren't totally
> bogus, that means out of 0.8sec per document I'm spending 10x(30ms to
> 40ms)=300ms to 400ms of time just loading up the binary.  That could
> be half of my total document processing time.
>
> I haven't gone looking at the guts to try and figure out if this is
> possible yet.  I was hoping to get some feedback as to how dumb (or
> perhaps not!) of an idea this is before I really launched into it.  So
> what does everyone think?  Would this be helpful to anyone else?  Does
> tesseract's architecture lend itself to staying in RAM for an extended
> period of time, for multiple images?
>
> I do realize that I could potentially just write out a single image
> with all the regions of interest contained within it, but my guess is
> that tesseract does some learning about what the font is as it
> processes characters.  And since each document might have different
> fonts, font sizes, etc I think that may be more harmful than
> beneficial.
>
>
Just from my bookmarks (have-a-look-on-this-later ;-)): there is
project COSI - The Common OCR Service Interface [1] that used patched
tesseract-ocr 2.04 in server mode. I did no test it yet, but maybe it would
be good start point.

Zdenko

[1] http://sourceforge.net/apps/mediawiki/cosi/index.php?title=Main_Page

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Problem with eng.traineddata after 3 or 4 successful runs against different pdf's

2011-04-14 Thread zdenko podobny
On Wed, Apr 13, 2011 at 2:31 AM, caudex  wrote:

> After using regedit and pointing tessdata_prefix to the right place
> and running again I got an error that referred to unicharset. The
> entire contents of my tessdata subdirectory is:
>
>  Directory of C:\tesseract\Tesseract-OCR\tessdata
>
> 04/08/2011  12:50p.
> 04/08/2011  12:50p..
> 04/08/2011  12:50pconfigs
> 04/08/2011  12:21p   2,395,687 deu.traineddata
> 10/03/2010  08:01a   1,926,792 eng.traineddata
> 04/08/2011  12:24p   2,292,872 fra.traineddata
> 04/08/2011  12:27p   2,434,628 ita.traineddata
> 04/08/2011  12:29p   2,281,434 spa.traineddata
> 04/08/2011  12:50ptessconfigs
>   5 File(s) 11,331,413 bytes
>   4 Dir(s)  47,724,969,984 bytes free
>
> (no unichar type files)
>
> Now the error is back to:
>
> C:\tesseract\Tesseract-OCR>tesseract ocr_107.tif beglat
> Error openning data file C:\Program Files\Tesseract-OCR\tessdata/
> eng.traineddata
>


> Well behaved w32 apps like emacs and gnuw32 utilities don't tell
> Windows about themselves, why does tesseract have to?
>
> Installer set user environment variable (You can access it on Windows XP
this way: My Computer -> Properties -> Advanced -> Environment
Variables->) TESSDATA_PREFIX to installation directory. See [1].
Your posts indicate that you moved tesseract to other place (e.g. you broke
your installation). Now you blame tesseract ;-)

You get error "Error openning data file" it means that it can not find
requested data file because of:

   1. TESSDATA_PREFIX point to "wrong" place - you can check it in command
   line (after you received this error) with command:
   echo %TESSDATA_PREFIX%
   2. TESSDATA_PREFIX points to correct place, but the file did not exists.
   You can check it by command:
   dir "%TESSDATA_PREFIX%tessdata"

There is report that if you change/remove TESSDATA_PREFIX (via regedit or
via My Computer -> Properties ->...) there is need to restart computer. If
you need to change it just for opened command line session, you can do it
with command:
set TESSDATA_PREFIX="your desired path\"

[1]
http://code.google.com/p/tesseract-ocr/source/browse/trunk/vs2008/tesseract.nsi#210


--
Zdenko


>
>
> On Apr 12, 6:59 pm, caudex  wrote:
> > After installing tesseract-ocr 3.0 successfully and running it
> > against  3 or 4 pdfs, I now get the following error
> >
> > C:\tesseract\Tesseract-OCR>tesseract ocr_107.tif beglat
> > Error openning data file C:\Program Files\Tesseract-OCR\tessdata/
> > eng.traineddata
> >
> > A dir on ...\tessdata shows:
> >
> > 10/03/2010  08:01a   1,926,792 eng.traineddata
> >
> > Notice the misspelling of openning and the / instead of \ in the
> > qualified path to eng.traineddata.
> >
> > Does any of you have a clue what could be going wrong here after it
> > worked correctly a few times?
> > I see that tesseract is looking for the tessdata subdirectory in the
> > wrong place (Program Files) instead of the current directory (where
> > the .tif's were created) but how did it work the first three times?
> > Under program files there is no tesseract-ocr subdirectory.
> >
> > Thanks,
> >
> > Ed
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Build files for the Tesseract OCR for android (Windows Xp)

2011-04-18 Thread zdenko podobny
Hi,

the commands mentioned in README are very simple that can be replaced by
other windows program (e.g. unpacking) or by gnuwin32 tools. If you can not
do it by yourself, that I really suggest you invest your time to something
else. You will face much more difficult tasks e.g. as far as i know nobody
was able to build tesseract on windows with gnu tools.

Zdenko

On Mon, Apr 18, 2011 at 1:45 PM, jeni  wrote:

> I have downloaded
>
> tesseract-android-tools
>
> http://code.google.com/p/tesseract-android-tools/
>
> From the READ ME file :
>
>
> http://tesseract-android-tools.googlecode.com/svn-history/r4/trunk/tesseract-android-tools/README
>
> I tried to build the tesseract, but the commands are given in READ ME are
> linux commands... I need to build it for Windows Xp..Please Help
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Difficulties to use Tesseract

2011-04-24 Thread zdenko podobny
Hello,

I use tesseract on Mandrivalinux without problem. But I compiled it by
myself ;-) I am not satiffied with packages provided by Mandriva team ;-)
e.g. they included tesseract 3.00 to cooker but without English language
data, they did not include leptonica library, that is used for image
handling...)

If you are able to compile software by yourself than try the latest svn
version [1]. Do not forget first install leptonica [2] ;-).

[1] http://code.google.com/p/tesseract-ocr/wiki/TesseractSvnInstallation
[2] http://leptonica.org/download.html

--
Zdenko


On Sun, Apr 24, 2011 at 5:43 PM, Giby_the_kid wrote:

> Some years ago, I was using tesseract and was very satisfied of the
> result, I tried to use it on a new computer, and it doesn't work.
>
> [papa@localhost ~]$ tesseract out.tiff text.txt -l fra
> Tesseract Open Source OCR Engine
> name_to_image_type:Error:Unrecognized image type:out.tiff
> IMAGE::read_header:Error:Can't read this image type:out.tiff
> tesseract:Error:Read of file failed:out.tiff
> Erreur de segmentation (core dumped)
> [papa@localhost ~]$
>
> Have I forgotten anything? should I install something else to make it
> work properly?
>
> OS mandriva 2010.2-64 bit, tesseract: 2.04 i586 with tesseract-fra
>
> thanks
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Difficulties to use Tesseract

2011-04-24 Thread zdenko podobny
Did you recompiled tesseract? Can you send your out.tiff? (search forum for
problems/limitation of tiff images)

--
Zdenko

On Sun, Apr 24, 2011 at 7:21 PM, Giby_the_kid wrote:

> Then after cheaking, libtiff is installed... I installed Leptonica,
> but it still does not work :(
>
> On 24 avr, 18:41, Oleg Tikhonov  wrote:
> > Hi,
> > I don't know what actually you've installed, but it seems that you
> forgotten
> > the libtiff.
> > Generally, tesseract depends on: lebtonika, and leptonika itself depends
> on:
> > libz, libpbg, libtiff, libgpeg.
> >
> > Make sure you have them all.
> >
> > Oleg
> >
> > On Sun, Apr 24, 2011 at 6:43 PM, Giby_the_kid <
> g.benjamin.le...@gmail.com>wrote:
> >
> > > Some years ago, I was using tesseract and was very satisfied of the
> > > result, I tried to use it on a new computer, and it doesn't work.
> >
> > > [papa@localhost ~]$ tesseract out.tiff text.txt -l fra
> > > Tesseract Open Source OCR Engine
> > > name_to_image_type:Error:Unrecognized image type:out.tiff
> > > IMAGE::read_header:Error:Can't read this image type:out.tiff
> > > tesseract:Error:Read of file failed:out.tiff
> > > Erreur de segmentation (core dumped)
> > > [papa@localhost ~]$
> >
> > > Have I forgotten anything? should I install something else to make it
> > > work properly?
> >
> > > OS mandriva 2010.2-64 bit, tesseract: 2.04 i586 with tesseract-fra
> >
> > > thanks
> >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to tesseract-ocr@googlegroups.com
> > > To unsubscribe from this group, send email to
> > > tesseract-ocr+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: creating train data set for Korean

2011-04-28 Thread zdenko podobny
On Thu, Apr 28, 2011 at 6:03 PM, Oleg Tikhonov wrote:

> Hi guys,
>
> I've installed tesseract-ocr 3.0 on Windows 7. All work fine if selected
> language is English.
> I tried to add/teach the system the Korean. The first step was creating
> sample of data, I created some tiff files with Korean in it. After, I ran
> tesseract command:
> tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[num]
> batch.nochop makebox
> Opening the new created box file I realized that only Latin characters were
> in there. What's wrong?
>

Nothing is wrong ;-) If you did not speciefied language (with -l option see
[1]), tesseract used default language: English. And as far as I know English
uses  Latin character only. So try to add  '-l kor' to your command (but do
not forget to install [2]).


> Might be I have to change a system language?
>

As far as I know tesseract do not care about system language.


> Please advise me how anyway to create a training data set? Thank you in
> advance,
>
>
General rules are written here [3]. I suggest to follow them closely. Have a
look on provided boxtiff files [4] for spa, eng, deu, ita, fra, nld as
examples.

There was aim for automatic training [5], but when the project
(tesseractindic) moved to gihtub I can not find the folder
(tesseract_trainer) in source code anymore.

Last advice: share your experiences with others ;-)

Zdenko

[1]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Bootstrapping_a_new_character_set
[2] http://tesseract-ocr.googlecode.com/files/kor.traineddata.gz
[3]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images
[4] http://code.google.com/p/tesseract-ocr/downloads/list
[5]
http://code.google.com/p/tesseractindic/source/browse/#svn%2Ftrunk%2Ftesseract_trainer


Oleg
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: creating train data set for Korean

2011-04-29 Thread zdenko podobny
Oleg,

Are you sure with message? "tesseract.exe" indicate that you are using
Windows... (I am not aware that any official linux build system create
'tesseract.exe') But part error message ('/usr/share/tessdata/') indicates
that you are in linux (or unix like) environment...

You wrote that you installed 'tesseract-ocr 3.0 on Windows 7'. But error
message indicate that you are using tesseract 2.0x. E.g. when I tried
tesseract 2.04 (on windows XP):

t204\tesseract.exe annyong_eng.png annyong_eng -l dummy

I got message:

Unable to load unicharset file C:\Program
Files\Tesseract-OCR\tessdata/dummy.unicharset


If I try tesseract 3.00:

tesseract.exe annyong_eng.png annyong_eng -l dummy

I got message:

Error openning data file C:\Program
Files\Tesseract-OCR\tessdata/dummy.traineddata


How did you install tesseract?

Zdenko

2011/4/29 Oleg Tikhonov 

> Zdenko,
> Honestly, I did not read a whole page, beg your pardon.
>
> Here is a command and the error/message
>
> $ tesseract.exe ../korean_training/annyong_eng.png
> ../korean_training/annyong_eng.png -l kor batch.nochop makebox
>
> Unable to load unicharset file /usr/share/tessdata/kor.unicharset
>
> Thanks,
>
> --Oleg
>
> 2011/4/29 zdenko podobny 
>
>> 2011/4/29 Oleg Tikhonov 
>>
>>> Zdenko, Quan and Sven,
>>> Thanks a lot for your suggestions, I think you nailed the problem,
>>> So, I installed the Korean language pack :-) however an archive has only
>>> one file - kor.traineddata.
>>> It doesn't have kor.unicharset, it causes a problem that during "loading"
>>> kor.traineddata, tesseract also depends on kor.unicharset.
>>>
>>
>>  Did you read whole [1] (upto the bottom)?
>>
>>  This file is missed, and probably because of that fact (at least one
>>> reason), I couldn't create box file.
>>>
>>
>> kor.unicharset is there. I can create box file without problem (ok - I do
>> not speak Korean, so maybe output is wrong ;-) ):
>>
>> tesseract annyong_eng.png annyong_eng -l kor batch.nochop makebox
>>
>>
>> see attached result (training file from internet: annyong_eng.png, created
>> box file annyong_eng.box and screenshot from box editor: screenshot.png)
>>
>>
>>> I tried to find that file, but without success. What I'm going to do, is
>>> to create by myself kor.unicharset. I'll look at eng.unicharset to have some
>>> comprehension what is a structure.
>>>
>>>
>> Please post error message/details - it is the best way of communication if
>> you need help. kor.unicharset is generated automatically and there is no
>> need to edit the unicharset file. It is written in [1]. Did you read it? You
>> can save a lot of time with careful reading documentation ;-)
>>
>> BR,
>>
>> Zdenko
>>
>> [1] http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
>>
>>
>> And of cause I'll change the training set according to the Quan/Sven
>>> suggestions.
>>>
>>>
>> -- Oleg
>>>
>>>
>>>
>>> 2011/4/29 Sven Pedersen 
>>>
>>>> Hi Oleg,
>>>> As Quan said, you need a higher resolution image, about 200--300 dpi
>>>> and it needs to be binary (black&white) not grayscale or color.
>>>> Screenshots are typically only 72 -- 90 dpi. I see that the wiki says
>>>> the character size in pixels in a confusing way.
>>>> --Sven
>>>>
>>>>
>>>> 2011/4/28 Quan Nguyen :
>>>> > Print screens are, in general, not adequate for training new
>>>> > languages. You'd be better off using GIMP to produce your TIFF images.
>>>> > Be sure to specify the language to bootstrap the new charset, such as:
>>>> >
>>>> > $ tesseract.exe ../korean_training/kor.ariel.exp1.tif ../
>>>> > korean_training/kor.ariel.exp1 -l kor batch.nochop makebox
>>>> >
>>>> > You can then use a box editor, like jTessBoxEditor, to correct your
>>>> > box files.
>>>> >
>>>> > On Apr 28, 1:06 pm, Oleg Tikhonov  wrote:
>>>> >> Hi Sven,
>>>> >>
>>>> >> Here is what I've done:
>>>> >> 1. Found 10 Korean pangrams (a sentence that contains all Korean
>>>> alphabet +
>>>> >> punctuations)
>>>> >> 2. Opened notepad++ and pasted line by line each pangram mixed up
>>>> with
>>>> >> punctuation, changed encoding to ut

Re: Deskew waves in a document

2011-05-07 Thread zdenko podobny
Hi,

I am not sure if I understood your problem (e.g. if you are looking for
 "dewarp" ("straighten text line") feature. In leptonica there are example
programs for dewarping: dewarp_reg.c and dewarptest.c. I try it to on one of
my project, but it did not worked on my images (e.g. I plan to play with it
later ;-) )

On  http://diybookscanner.org I found references for (commercial) program
Book Restorer [1]. I had change to test it and I can proof it worked perfect
- if you need to straight text lines in output.

I have experience only with these. But if you google for "dewarp" or "Image
straightening algorithm" you can find a lot of interesting suggestions
for algorithm or programs ([2], [3])

[1]
http://www.i2s-bookscanner.com/produits.asp?gamme=1011&sX_Menu_selectedID=left_1011_GEN
[2] http://rsb.info.nih.gov/ij/plugins/straighten.html
[3] http://stackoverflow.com/questions/4783136/image-straightening-algorithm

--
Zdenko

On Fri, May 6, 2011 at 8:49 PM, Patrick Collins  wrote:

> Hi,
> I am trying to scan a series of documents which have been badly skewed by
> the book's edge. Has anyone seen any commercial or open sources
> implementations of deskewing software which can handle advanced deskew's
> like this?
>
> Patrick.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Deskew waves in a document

2011-05-07 Thread zdenko podobny
here is link for leptonica dewarp documentation:
http://tpgit.github.com/UnOfficialLeptDocs/leptonica/dewarping.html

Zdenko

On Sat, May 7, 2011 at 9:19 AM, zdenko podobny  wrote:

> Hi,
>
> I am not sure if I understood your problem (e.g. if you are looking for
>  "dewarp" ("straighten text line") feature. In leptonica there are example
> programs for dewarping: dewarp_reg.c and dewarptest.c. I try it to on one of
> my project, but it did not worked on my images (e.g. I plan to play with it
> later ;-) )
>
> On  http://diybookscanner.org I found references for (commercial) program
> Book Restorer [1]. I had change to test it and I can proof it worked perfect
> - if you need to straight text lines in output.
>
> I have experience only with these. But if you google for "dewarp" or "Image
> straightening algorithm" you can find a lot of interesting suggestions
> for algorithm or programs ([2], [3])
>
> [1]
> http://www.i2s-bookscanner.com/produits.asp?gamme=1011&sX_Menu_selectedID=left_1011_GEN
> [2] http://rsb.info.nih.gov/ij/plugins/straighten.html
> [3]
> http://stackoverflow.com/questions/4783136/image-straightening-algorithm
>
> --
> Zdenko
>
> On Fri, May 6, 2011 at 8:49 PM, Patrick Collins wrote:
>
>> Hi,
>> I am trying to scan a series of documents which have been badly skewed by
>> the book's edge. Has anyone seen any commercial or open sources
>> implementations of deskewing software which can handle advanced deskew's
>> like this?
>>
>> Patrick.
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Custom Wordlist without Retraining

2011-05-08 Thread zdenko podobny
see [1] or user-words on the same page.

[1]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Putting_it_all_together

Zdenko

On Sun, May 8, 2011 at 5:53 PM, Max Cantor  wrote:

> Is there a way to set up a custom wordlist without going through the entire
> retraining process?  our wordlists will change a bit at runtime, so if there
> is an API variable to set, that would be perfect for us.
>
> Thanks,
> Max
>
> Keep up the good work!
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Custom Wordlist without Retraining

2011-05-08 Thread zdenko podobny
Please try to read (to look is not enough ;-) ) [1] :

 // Specify option -u to unpack all the components to the specified path:
//
// combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng.
//
// This will create  /home/$USER/temp/eng.* files with individual tessdata
// components from tessdata/eng.traineddata.
//

[1]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Putting_it_all_together

On Mon, May 9, 2011 at 2:01 AM, Max Cantor  wrote:

> I was looking at that, but can't find the other component files in the
> source tree.  is there somewhere to get the component files for the
> eng.trainneddata?
>
> sorry if i'm missing something obvious...
>
> max
> On May 9, 2011, at 1:40 AM, zdenko podobny wrote:
>
> > see [1] or user-words on the same page.
> >
> > [1]
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Putting_it_all_together
> >
> > Zdenko
> >
> > On Sun, May 8, 2011 at 5:53 PM, Max Cantor  wrote:
> > Is there a way to set up a custom wordlist without going through the
> entire retraining process?  our wordlists will change a bit at runtime, so
> if there is an API variable to set, that would be perfect for us.
> >
> > Thanks,
> > Max
> >
> > Keep up the good work!
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to tesseract-ocr@googlegroups.com
> > To unsubscribe from this group, send email to
> > tesseract-ocr+unsubscr...@googlegroups.com
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to tesseract-ocr@googlegroups.com
> > To unsubscribe from this group, send email to
> > tesseract-ocr+unsubscr...@googlegroups.com
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Custom Wordlist without Retraining

2011-05-09 Thread zdenko podobny
no problem :-) I think you will like option "-o" too.

Zdenko

On Mon, May 9, 2011 at 8:27 AM, Max Cantor  wrote:

> I feel really dumb now. Sorry for the bother.
>
>
> Thanks, max
>
> On May 9, 2011, at 14:01, zdenko podobny  wrote:
>
> Please try to read (to look is not enough ;-) ) [1] :
>
>  // Specify option -u to unpack all the components to the specified path:
> //
>
>
> // combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng.
> //
>
>
> // This will create  /home/$USER/temp/eng.* files with individual tessdata
> // components from tessdata/eng.traineddata.
>
>
> //
>
> [1] 
> <http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Putting_it_all_together>
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Putting_it_all_together
>
> On Mon, May 9, 2011 at 2:01 AM, Max Cantor < 
> mxcan...@gmail.com> wrote:
>
>> I was looking at that, but can't find the other component files in the
>> source tree.  is there somewhere to get the component files for the
>> eng.trainneddata?
>>
>> sorry if i'm missing something obvious...
>>
>> max
>> On May 9, 2011, at 1:40 AM, zdenko podobny wrote:
>>
>> > see [1] or user-words on the same page.
>> >
>> > [1]
>> <http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Putting_it_all_together>
>> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Putting_it_all_together
>> >
>> > Zdenko
>> >
>> > On Sun, May 8, 2011 at 5:53 PM, Max Cantor < 
>> mxcan...@gmail.com> wrote:
>> > Is there a way to set up a custom wordlist without going through the
>> entire retraining process?  our wordlists will change a bit at runtime, so
>> if there is an API variable to set, that would be perfect for us.
>> >
>> > Thanks,
>> > Max
>> >
>> > Keep up the good work!
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to 
>> tesseract-ocr@googlegroups.com
>> > To unsubscribe from this group, send email to
>> > 
>> tesseract-ocr+unsubscr...@googlegroups.com
>> > For more options, visit this group at
>> > <http://groups.google.com/group/tesseract-ocr?hl=en>
>> http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to 
>> tesseract-ocr@googlegroups.com
>> > To unsubscribe from this group, send email to
>> > 
>> tesseract-ocr+unsubscr...@googlegroups.com
>> > For more options, visit this group at
>> > <http://groups.google.com/group/tesseract-ocr?hl=en>
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to 
>> tesseract-ocr@googlegroups.com
>> To unsubscribe from this group, send email to
>>  
>> tesseract-ocr+unsubscr...@googlegroups.com
>> For more options, visit this group at
>>  <http://groups.google.com/group/tesseract-ocr?hl=en>
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to 
> tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
>  
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
>  <http://groups.google.com/group/tesseract-ocr?hl=en>
> http://groups.google.com/group/tesseract-ocr?hl=en
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Catalan language

2011-05-11 Thread zdenko podobny
On Wed, May 11, 2011 at 9:22 PM, jinglada  wrote:

> In the /usr/share/tesseract-ocr/tessdata I have the following files:
>
> cat.DangAmbigs  spa.DangAmbigs  eng.DangAmbigs  fra.DangAmbigs
> por.DangAmbigs
> cat.freq-dawg   spa.freq-dawg   eng.freq-dawg   fra.freq-dawg
> por.freq-dawg
> cat.inttemp spa.inttemp eng.inttemp fra.inttemp
> por.inttemp
> cat.normproto   spa.normproto   eng.normproto   fra.normproto
> por.normproto
> cat.pffmtable   spa.pffmtable   eng.pffmtable   fra.pffmtable
> por.pffmtable
> cat.unicharset  spa.unicharset  eng.unicharset  fra.unicharset
> por.unicharset
> cat.user-words  spa.user-words  eng.user-words  fra.user-words  por.user-
> words
> cat.word-dawg   spa.word-dawg   eng.word-dawg   fra.word-dawg
> por.word-dawg
>
> but the program only shows Portuguese, Spanish, French, English
>
> Which program? What version you try to use? Where it show?


> What I have to do to activate Catalan (cat.) language?
>

tesseract do not need to activate language


> Thanks in advance.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: mftraining produces "Missing font_properties"

2011-05-17 Thread zdenko podobny
On Tue, May 17, 2011 at 9:08 AM, Eyal  wrote:

> Hi,
>
> I tried to train some letters & when I ran the *mftraining *with the
> parameters*:*
> *mftraining -U unicharset -O lang.unicharset font1.tr *I recieved an error
> message: "Missing font_properties".
>
> I'm working on windows 7, visual studio 2010.
>
> When I used the already compiled mftraining.exe for windows I do *NOT *getting
> this error & I'm getting decent results from the trained file.
>
> Just to be sure the text I have is not problematic, I did the same test on
> eurotext.tif and *I'm still getting the error.*
> *
> *
> Any clue?
>
> Yes I have a clue: you did not read documentation [1] neither you did not
"google" for solution ;-)

Zdenko

[1]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#font_properties_(new_in_3.01)


> Thank you!
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: mftraining produces "Missing font_properties"

2011-05-17 Thread zdenko podobny
On Tue, May 17, 2011 at 11:58 AM, Eyal  wrote:

> Quite a good guess, but I'm very disappointed to to say - I DID read the
> documentation...
>
> And I even run the following command:
>
> *mftraining -F font_properties -U unicharset font1.tr*
>
> And I got results which don't show any error... :
>
> *Reading font1.tr ...*
> *
> *
> *Writing Merged Microfeat ...Done!*
> *
> *
> The font_properties file contains one line as follows:
>
> *font1 0 0 0 0 0*
> *
> *
> And then I run the command:
>
> *mftraining -U unicharset -O lang1.unicharset font1.tr*
>
> And I'm getting the following results:
>
> *Reading font1.tr ...*
> *font1 has no defined properties.*
> *!"Missing font_properties entry is a fatal error!":Error:Assert failed:in
> file ..\training\mftraining.cpp, line 287*
>
> Another guess?
>
> Why you did not run '*mftraining **-F font_properties **-U unicharset -O
lang1.unicharset font1.tr' *???
There is written that font_properties is required for 3.01 (e.g. you do not
need to use '-F font_properties' for 3.00, but you need to use it for 3.01.

BTW: 'mftraining --help' will show other options for mftraining

Zdenko


> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: About the jpn.traindata

2011-05-17 Thread zdenko podobny
On Tue, May 17, 2011 at 5:01 PM, Илья  wrote:

> IMHO alphabets can't be protected by copyright.
>
> Mostafa did not asked for an alphabets. He asked for 'all the tif files
that used for creating...' and content of tiff file (e.g. scanned books)
could be protected by copyright.


 --
> Best regards,
> Ilia.
>
>
> В Втр, 17/05/2011 в 09:24 -0400, Dmitri Silaev пишет:
> > I think copyright issues are preventing the dev team from publishing
> > these source files. However you can try to contact this forum's
> > moderator directly - he probably can take decision to share.
> >
> > --
> > Dmitri
> >
> >
> >
> >
> >
> > On Tue, May 17, 2011 at 4:58 AM, Mostafa 
> wrote:
> > > Hi,
> > >
> > > I am interested to get all the tif files that used for creating the
> > > jpn.traindata.
> > > I just want to see how many characters are supported in that file.
> > > Because I have some other Japanese characters that can't be recognized
> > > by
> > > the tesseract OCR.
> > >
> > > Does anybody know, where are those tif files ?
> > >
> > > Thanks
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to tesseract-ocr@googlegroups.com
> > > To unsubscribe from this group, send email to
> > > tesseract-ocr+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > > http://groups.google.com/group/tesseract-ocr?hl=en
> > >
> >
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Issue 490 - Exception while training with mftraining & cntraining

2011-05-19 Thread zdenko podobny
On Thu, May 19, 2011 at 9:38 AM, Eyal  wrote:

> I've opened this issue:
> http://code.google.com/p/tesseract-ocr/issues/detail?id=490&start=100
> Afterward
> I've noticed that there's alreadya similar issue 382:
> http://code.google.com/p/tesseract-ocr/issues/detail?id=382
>
> How can I know if it was solved? should there be a comment over it? I
> didn't see any.
>
> If there is solution there would be remark/comment for it. This issue is
still opened.


> I'm not sure if I should combine the 2 issues together.
>
> The same error at the same stage of traning => one of the issues should be
marked as dupliacated.


> Thank you,
>
> Eyal
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Issue 490 - Exception while training with mftraining & cntraining

2011-05-19 Thread zdenko podobny
I did it.

Zdenko

On Thu, May 19, 2011 at 11:52 AM, Eyal  wrote:

> I didn't find a way to mark an issue as duplicate.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: About the jpn.traindata

2011-05-19 Thread zdenko podobny
2011/5/19 Mostafa 

> Hi Again,
>
> Seems no body knows where it is hiding.
> Should I contact with CIA agent ? lol
>

If somebody is really interesting she/he can know answer ;-). Within 1
minute ;-) ([1] [2] [3]). BTW: there is Developers
forum<http://groups.google.com/group/tesseract-dev>
.


> But I am kinda serious about the data.
>

There were several requests for training data (in forum, in issues). I did
it too. There was no official reply to such requests. AFAIK Google is
not obliged to release them. So I guess they have a reason for not providing
them.

On other hand this could be opportunity for tesseract community :-): to
create alternative training set. As Ray mentioned ([3]) they use "more
automated training process based on rendering text from fonts", so training
base on "real world" scanned documents could be interesting (but more
difficult)


Zdenko

[1] http://code.google.com/p/tesseract-ocr/people/list
[2] http://code.google.com/p/tesseract-ocr/source/list
[3] http://groups.google.com/group/tesseract-dev/msg/1cdf3ebe8743d935


>  Mostafa
>
> On May 18, 2:43 am, Илья  wrote:
> > He need for table that contains all supported alphabetics characters.
> > Also, Parts of scanned books could not be protected by copyright.
> >
> > Can you give any contacts of "jpn.traindata" dev team?
> >
> > --
> > Best regards,
> >  Ilia.
> >
> > В Втр, 17/05/2011 в 18:24 +0200, zdenko podobny пишет:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > On Tue, May 17, 2011 at 5:01 PM, Илья  wrote:
> > > IMHO alphabets can't be protected by copyright.
> >
> > > Mostafa did not asked for an alphabets. He asked for 'all the tif
> > > files that used for creating...' and content of tiff file (e.g.
> > > scanned books) could be protected by copyright.
> >
> > > --
> > > Best regards,
> > > Ilia.
> >
> > > В Втр, 17/05/2011 в 09:24 -0400, Dmitri Silaev пишет:
> >
> > > > I think copyright issues are preventing the dev team from
> > > publishing
> > > > these source files. However you can try to contact this
> > > forum's
> > > > moderator directly - he probably can take decision to share.
> >
> > > > --
> > > > Dmitri
> >
> > > > On Tue, May 17, 2011 at 4:58 AM, Mostafa
> > >  wrote:
> > > > > Hi,
> >
> > > > > I am interested to get all the tif files that used for
> > > creating the
> > > > >jpn.traindata.
> > > > > I just want to see how many characters are supported in
> > > that file.
> > > > > Because I have some other Japanese characters that can't
> > > be recognized
> > > > > by
> > > > > the tesseract OCR.
> >
> > > > > Does anybody know, where are those tif files ?
> >
> > > > > Thanks
> >
> > > > > --
> > > > > You received this message because you are subscribed to
> > > the Google
> > > > > Groups "tesseract-ocr" group.
> > > > > To post to this group, send email to
> > > tesseract-ocr@googlegroups.com
> > > > > To unsubscribe from this group, send email to
> > > > > tesseract-ocr+unsubscr...@googlegroups.com
> > > > > For more options, visit this group at
> > > > >http://groups.google.com/group/tesseract-ocr?hl=en
> >
> > > --
> > > You received this message because you are subscribed to the
> > > Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to
> > > tesseract-ocr@googlegroups.com
> > > To unsubscribe from this group, send email to
> > > tesseract-ocr+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en
> >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to tesseract-ocr@googlegroups.com
> > > To unsubscribe from this group, send email to
> > > tesseract-ocr+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: mftraining produces "Missing font_properties"

2011-05-19 Thread zdenko podobny
On Wed, May 18, 2011 at 1:15 PM, Eyal  wrote:

> WOW!!!
>
> It worked.
>
> If you'll look again at the training manual, you'll see that there wasn't a
> combination of both -F & -O and that's why I didn't write such command.
>
> I will try to improve wiki pages (e.g. AddOns) in next days. If you or
others have some project (that use tesseract) or if you have found some
bugs/mistake on wiki - just let me know ;-)

--
Zdenko

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: can't compile tesseract on win7 with visual C++ 2010 express

2011-05-20 Thread zdenko podobny
Hi,

It is written on main page: supported platform is Windows (x86/32) with
Visual C++ Express 2008 [1]. As I heard it is not a big problem to compile
VS2008 project files (in directory vs2008) in VS2010  ;-)

In svn version there is also initial support for VS2010 (directory vs2010)
created Michael Lutz. But there are some issues that need to be fixed (e.g.
debug version is not working at the moment - release version should be ok).
SVN version should be used by developers...

Because I still use VS 2008 (= I can not help with VS2010 issues) - patches
and improvements for VS2010 are welcomed.

Zdenko

[1] http://code.google.com/p/tesseract-ocr/#Supported_Platforms

On Fri, May 20, 2011 at 1:35 PM, barroque  wrote:

> Hi there,
>
> I'm a newbie on OCR. I tried to play with tesseract but encountered
> some problems whiling compiling tesseract (latest rev. 582). by Visual
> C++ 2010 express, the error messages are like the following.  Though
> the tesseract.exe could be generated, but I'm not sure what I'm
> missing here... Moreover, when I try to compile the debug version,
> more error pumped and no tesseract.exe generated.   Can you please
> help?  Thanks in advance.
>
>
> Release version
> 
> 15>  c:\jed\devel\tesseract-ocr-read-only\vs2010\include
> \leptonica\environ.h(277) : see previous definition of 'snprintf'
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol @__security_check_cookie@4
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__towlower
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__iswupper
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__iswlower
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__printf
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__fopen
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__iswalpha
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol "void __cdecl operator delete(void *)" (??3@YAXPAX@Z)
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__setlocale
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__towupper
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__fclose
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__iswpunct
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__exit
> 18>unicharset_extractor.obj : error LNK2001: unresolved external
> symbol __imp__iswdigit
> 18>LINK : error LNK2001: unresolved external symbol _mainCRTStartup
> 18>ccutil.lib(strngs.obj) : error LNK2001: unresolved external symbol
> __imp__strchr
> 18>ccutil.lib(strngs.obj) : error LNK2001: unresolved external symbol
> __imp___snprintf
> 18>ccutil.lib(boxread.obj) : error LNK2001: unresolved external symbol
> __imp__sscanf
> 18>ccutil.lib(boxread.obj) : error LNK2001: unresolved external symbol
> __imp__strncpy
> 18>ccutil.lib(boxread.obj) : error LNK2001: unresolved external symbol
> __imp__fgets
> 18>ccutil.lib(boxread.obj) : error LNK2001: unresolved external symbol
> __imp__strrchr
> 18>ccutil.lib(unicharset.obj) : error LNK2001: unresolved external
> symbol __imp__sprintf
> 18>ccutil.lib(unicharset.obj) : error LNK2001: unresolved external
> symbol __impiob_func
> 18>ccutil.lib(unicharset.obj) : error LNK2001: unresolved external
> symbol __imp__strtol
> 18>ccutil.lib(unicharset.obj) : error LNK2001: unresolved external
> symbol __imp__fprintf
> 18>ccutil.lib(unicharset.obj) : error LNK2001: unresolved external
> symbol "void * __cdecl operator new(unsigned int)" (??2@YAPAXI@Z)
> 18>ccutil.lib(memry.obj) : error LNK2001: unresolved external symbol
> __imp__free
> 18>ccutil.lib(memry.obj) : error LNK2001: unresolved external symbol
> __imp__calloc
> 18>ccutil.lib(memry.obj) : error LNK2001: unresolved external symbol
> __imp__malloc
> 18>ccutil.lib(errcode.obj) : error LNK2001: unresolved external symbol
> __imp__abort
> 18>ccutil.lib(errcode.obj) : error LNK2001: unresolved external symbol
> __imp___vsnprintf
> 18>ccutil.lib(tprintf.obj) : error LNK2001: unresolved external symbol
> __imp__vsprintf
> 18>ccutil.lib(tprintf.obj) : error LNK2001: unresolved external symbol
> _atexit
> 18>ccutil.lib(unicharmap.obj) : error LNK2001: unresolved external
> symbol "void __stdcall `eh vector destructor iterator'(void *,unsigned
> int,int,void (__thiscall*)(void *))" (??_M@YGXPAXIHP6EX0@Z@Z)
> 18>ccutil.lib(unicharmap.obj) : error LNK2001: unresolved external
> symbol "void __stdcall `eh vector constructor iterator'(void
> *,unsigned int,int,void (__thiscall*)(void *),void (__thiscall*)(void
> *))" (??_L@YGXPAXIHP6EX0@Z1@Z)
> 18>ccutil.lib(params.obj) : error LNK2001: unresolved external symbol
> __imp__fread
> 18>ccutil.lib(params.obj) : error LNK2001: unresolv

Re: can't compile tesseract on win7 with visual C++ 2010 express

2011-05-20 Thread zdenko podobny
have a look on http://code.google.com/p/tesseractdotnet/

On Fri, May 20, 2011 at 3:04 PM, Sarel van der Merwe wrote:

> Hi,
>
> Will it be possible to create a dll file so that i can use it inside Visual
> Studio 2010 c#   ?
> I'm really stuck.
>
> Please help
>
> Thanks
>
> Sarel
>
>
>
>
>
>
> On Fri, May 20, 2011 at 2:15 PM, zdenko podobny  wrote:
>
>>  Hi,
>>
>> It is written on main page: supported platform is Windows (x86/32) with
>> Visual C++ Express 2008 [1]. As I heard it is not a big problem to compile
>> VS2008 project files (in directory vs2008) in VS2010  ;-)
>>
>> In svn version there is also initial support for VS2010 (directory vs2010)
>> created Michael Lutz. But there are some issues that need to be fixed
>> (e.g. debug version is not working at the moment - release version should be
>> ok). SVN version should be used by developers...
>>
>> Because I still use VS 2008 (= I can not help with VS2010 issues) -
>> patches and improvements for VS2010 are welcomed.
>>
>> Zdenko
>>
>> [1] http://code.google.com/p/tesseract-ocr/#Supported_Platforms
>>
>>
>> On Fri, May 20, 2011 at 1:35 PM, barroque  wrote:
>>
>>> Hi there,
>>>
>>> I'm a newbie on OCR. I tried to play with tesseract but encountered
>>> some problems whiling compiling tesseract (latest rev. 582). by Visual
>>> C++ 2010 express, the error messages are like the following.  Though
>>> the tesseract.exe could be generated, but I'm not sure what I'm
>>> missing here... Moreover, when I try to compile the debug version,
>>> more error pumped and no tesseract.exe generated.   Can you please
>>> help?  Thanks in advance.
>>>
>>>
>>> Release version
>>> 
>>> 15>  c:\jed\devel\tesseract-ocr-read-only\vs2010\include
>>> \leptonica\environ.h(277) : see previous definition of 'snprintf'
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol @__security_check_cookie@4
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__towlower
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__iswupper
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__iswlower
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__printf
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__fopen
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__iswalpha
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol "void __cdecl operator delete(void *)" (??3@YAXPAX@Z)
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__setlocale
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__towupper
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__fclose
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__iswpunct
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__exit
>>> 18>unicharset_extractor.obj : error LNK2001: unresolved external
>>> symbol __imp__iswdigit
>>> 18>LINK : error LNK2001: unresolved external symbol _mainCRTStartup
>>> 18>ccutil.lib(strngs.obj) : error LNK2001: unresolved external symbol
>>> __imp__strchr
>>> 18>ccutil.lib(strngs.obj) : error LNK2001: unresolved external symbol
>>> __imp___snprintf
>>> 18>ccutil.lib(boxread.obj) : error LNK2001: unresolved external symbol
>>> __imp__sscanf
>>> 18>ccutil.lib(boxread.obj) : error LNK2001: unresolved external symbol
>>> __imp__strncpy
>>> 18>ccutil.lib(boxread.obj) : error LNK2001: unresolved external symbol
>>> __imp__fgets
>>> 18>ccutil.lib(boxread.obj) : error LNK2001: unresolved external symbol
>>> __imp__strrchr
>>> 18>ccutil.lib(unicharset.obj) : error LNK2001: unresolved external
>>> symbol __imp__sprintf
>>> 18>ccutil.lib(unicharset.obj) : error LNK2001: unresolved external
>>> symbol __impiob_func
>>> 18>ccutil.lib(unicharset.obj) : error LNK2001: unresolved external
>>> symbol __imp__strtol
>>> 18>ccutil.lib(

Re: Tesseract 3.01 Training and Error opening unicharset file

2011-05-21 Thread zdenko podobny
On Fri, May 20, 2011 at 4:44 PM, Holm Dressler
wrote:

> Hi there,
>
> I want to create tessdata files on a given tiff on my Linux system. My
> tiff is called k05.tif
>
> I used the description on
>
> http://aravindavk.in/view/tesseract_ocr_initial_setup
>
>  which means I do the following step by step:
>
>
> 1. tesseract k05.tif k05 batch.nochop makebox
> 2. I clean up the box file with jTessBoxEditor.jar (still have
> problems with special characters like the German ö,ä,ü ...)
>

you can try  [1] or other box editors [2] (jTessBoxEditor will be included
there in next wiki update).

Zdenko

[1] https://github.com/zdenop/qt-box-editor
[2]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Box_File_Editors


> 3. tesseract k05.tif k05 nobatch box.train
> 4. unicharset_extractor k05.box
> 5. cp unicharset k05.unicharset
> 6. echo k05 0 0 0 0 0 > font_properties
> 7. mftraining -F font_properties -U unicharset k05.tr
> 8. mftraining -F font_properties -U unicharset -O k05.unicharset
> k05.tr
> 9. cntraining k05.tr
> 10. mv Microfeat k05.Microfeat
> 11. mv normproto k05.normproto
> 12. mv pffmtable k05.pffmtable
> 13. mv mfunicharset k05.mfunicharset
> 14. mv inttemp k05.inttemp
> 15. wordlist2dawg frequent_words_list k05.freq-dawg k05.unicharset
>
> Everything works, but combining all the files with
>
> combine_tessdata k05
>
> results in
>
> Error opening unicharset file
>
>
> The file unicharset exists in my directory (in /home/test/training) I
> also renamed the file to k05.unicharset. THE FILE IS NOT EMPTY.
>
> Somebody knows what I am doing wrong?
>
> Thanks for any advice,
>
> Holm
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Create traineddata from different tif and box files

2011-05-26 Thread zdenko podobny
Hi,

Problem is that you use the latest version and you do not read the latest
manual [1]. If I correctly understood that German manual (via google
translate), it is for version 3.00 so it do not follow changes in 3.01
version.

Another "problem": 3.01 is not released yet. It is for developers and
experienced tester for testing and bug reporting. IMHO 3.01 training is not
fully documented.

Zdenko

[1] http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

On Thu, May 26, 2011 at 10:59 AM, Holm Dressler  wrote:

> Hi there,
>
> I am using Tesseract 3.01 under Linux.
>
> I can successfully create traineddata  from one *.tif file. But
> combining different tif / box files give me an exception:
> What are the steps:
>
> Let's say I want to create a traineddata from two tif files: 01.tif
> and 02.tif
>
> 1. tesseract 01.tif 01 batch.nochop makebox
> 2. tesseract 02.tif 02 batch.nochop makebox
> 3. I check the two box files using jTessBoxEditor
> 4. tesseract 01.tif 01 nobatch box.train
> 5. tesseract 02.tif 02 nobatch box.train
> 6. As described under
> http://wiki.ubuntuusers.de/tesseract-ocr/tesseract-ocr_trainieren
> (sorry: it is on German, but the commands are the same) I create the
> *.tr files:
> 7. mftraining 01.tr 02.tr
>
> But this results in error: Reading 01.tr ...01 has no defined
> properties.
> !"Missing font_properties entry is a fatal error!":Error:Assert
> failed:in file mftraining.cpp, line 287
> Segmentation fault
>
>
> Also trying to create unicharset with
>
> unicharset_extractor 01.box 02.box
>
> works successfully, but mftraining -U ./unicharset 01.tr 02.tr fails
> with the same error.
>
>
> Somebody has an idea what I am doing wrong.

Also using the group e.g.
> with the search word "combine" did not result in any fitting
> solution.
>
> Thanks for any advice,
>
> Holm from Germany
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Create traineddata from different tif and box files

2011-05-26 Thread zdenko podobny
On Thu, May 26, 2011 at 2:02 PM, Sarel van der Merwe wrote:

> Hi,
>
> Do you know where i can locate the version 3 manual or reference guide
> for Tesseract..
>
> The I know is in download section (tessdoc-html-3.0.0-preview1.tar.gz) ;-)
 Maybe Jimmi will update it for 3.01 :-)
Some good information could be found in tesseract forums.
All links are on main project page. Surprisingly ;-)

Zdenko

Thanks
>
> Sarel
>
>
>
>
> On Thu, May 26, 2011 at 1:33 PM, zdenko podobny  wrote:
> > Hi,
> > Problem is that you use the latest version and you do not read the latest
> > manual [1]. If I correctly understood that German manual (via google
> > translate), it is for version 3.00 so it do not follow changes in 3.01
> > version.
> > Another "problem": 3.01 is not released yet. It is for developers and
> > experienced tester for testing and bug reporting. IMHO 3.01 training is
> not
> > fully documented.
> >
> > Zdenko
> > [1] http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
> > On Thu, May 26, 2011 at 10:59 AM, Holm Dressler
> >  wrote:
> >>
> >> Hi there,
> >>
> >> I am using Tesseract 3.01 under Linux.
> >>
> >> I can successfully create traineddata  from one *.tif file. But
> >> combining different tif / box files give me an exception:
> >> What are the steps:
> >>
> >> Let's say I want to create a traineddata from two tif files: 01.tif
> >> and 02.tif
> >>
> >> 1. tesseract 01.tif 01 batch.nochop makebox
> >> 2. tesseract 02.tif 02 batch.nochop makebox
> >> 3. I check the two box files using jTessBoxEditor
> >> 4. tesseract 01.tif 01 nobatch box.train
> >> 5. tesseract 02.tif 02 nobatch box.train
> >> 6. As described under
> >> http://wiki.ubuntuusers.de/tesseract-ocr/tesseract-ocr_trainieren
> >> (sorry: it is on German, but the commands are the same) I create the
> >> *.tr files:
> >> 7. mftraining 01.tr 02.tr
> >>
> >> But this results in error: Reading 01.tr ...01 has no defined
> >> properties.
> >> !"Missing font_properties entry is a fatal error!":Error:Assert
> >> failed:in file mftraining.cpp, line 287
> >> Segmentation fault
> >>
> >>
> >> Also trying to create unicharset with
> >>
> >> unicharset_extractor 01.box 02.box
> >>
> >> works successfully, but mftraining -U ./unicharset 01.tr 02.tr fails
> >> with the same error.
> >>
> >>
> >> Somebody has an idea what I am doing wrong.
> >>
> >> Also using the group e.g.
> >> with the search word "combine" did not result in any fitting
> >> solution.
> >>
> >> Thanks for any advice,
> >>
> >> Holm from Germany
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups "tesseract-ocr" group.
> >> To post to this group, send email to tesseract-ocr@googlegroups.com
> >> To unsubscribe from this group, send email to
> >> tesseract-ocr+unsubscr...@googlegroups.com
> >> For more options, visit this group at
> >> http://groups.google.com/group/tesseract-ocr?hl=en
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to tesseract-ocr@googlegroups.com
> > To unsubscribe from this group, send email to
> > tesseract-ocr+unsubscr...@googlegroups.com
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Create traineddata from different tif and box files

2011-06-01 Thread zdenko podobny
It it written on training doc[1]:
"*…**each .tr filename must match an entry in the font_properties file, or
mftraining will abort.*"

So you could save your time if you read documentation.

Zdenko

[1]
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#font_properties_(new_in_3.01)

On Wed, Jun 1, 2011 at 4:33 PM, Holm Dressler
wrote:

> Hi there,
>
> OK, found it out by myself: here are the steps:
>
> 1. Create 01.tr with tesseract 01.tif 01 nobatch box.train
> 2. Create 02.tr with tesseract 02.tif 02 nobatch box.train
> 3. Create unicharset with: unicharset_extractor 01.box 02.box
> 4. Just copy it (maybe it is not necessary) cp unicharset
> 02.unicharset
> 5. echo 01 0 0 0 0 0 > font_properties
> 6. echo 02 0 0 0 0 0 >> font_properties
> 7. mftraining -F font_properties -U unicharset 01.tr 02.tr
>
> SO YOU SEE:  step 6 was missing (with >>  which means you should have
> two lines in your font_properties)
>
>

> So Jimmi: now it is your turn :-)
>
> Talk soon
>
> Holm
>
>
>
> On May 26, 2:23 pm, zdenko podobny  wrote:
> > On Thu, May 26, 2011 at 2:02 PM, Sarel van der Merwe <
> sfvdme...@gmail.com>wrote:
> >
> > > Hi,
> >
> > > Do you know where i can locate the version 3 manual or reference guide
> > > for Tesseract..
> >
> > > The I know is in download section (tessdoc-html-3.0.0-preview1.tar.gz)
> ;-)
> >
> >  Maybe Jimmi will update it for 3.01 :-)
> > Some good information could be found in tesseract forums.
> > All links are on main project page. Surprisingly ;-)
> >
> > Zdenko
> >
> > Thanks
> >
> >
> >
> > > Sarel
> >
> > > On Thu, May 26, 2011 at 1:33 PM, zdenko podobny 
> wrote:
> > > > Hi,
> > > > Problem is that you use the latest version and you do not read the
> latest
> > > > manual [1]. If I correctly understood that German manual (via google
> > > > translate), it is for version 3.00 so it do not follow changes in
> 3.01
> > > > version.
> > > > Another "problem": 3.01 is not released yet. It is for developers and
> > > > experienced tester for testing and bug reporting. IMHO 3.01 training
> is
> > > not
> > > > fully documented.
> >
> > > > Zdenko
> > > > [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
> > > > On Thu, May 26, 2011 at 10:59 AM, Holm Dressler
> > > >  wrote:
> >
> > > >> Hi there,
> >
> > > >> I am using Tesseract 3.01 under Linux.
> >
> > > >> I can successfully create traineddata  from one *.tif file. But
> > > >> combining different tif / box files give me an exception:
> > > >> What are the steps:
> >
> > > >> Let's say I want to create a traineddata from two tif files: 01.tif
> > > >> and 02.tif
> >
> > > >> 1. tesseract 01.tif 01 batch.nochop makebox
> > > >> 2. tesseract 02.tif 02 batch.nochop makebox
> > > >> 3. I check the two box files using jTessBoxEditor
> > > >> 4. tesseract 01.tif 01 nobatch box.train
> > > >> 5. tesseract 02.tif 02 nobatch box.train
> > > >> 6. As described under
> > > >>http://wiki.ubuntuusers.de/tesseract-ocr/tesseract-ocr_trainieren
> > > >> (sorry: it is on German, but the commands are the same) I create the
> > > >> *.tr files:
> > > >> 7. mftraining 01.tr 02.tr
> >
> > > >> But this results in error: Reading 01.tr ...01 has no defined
> > > >> properties.
> > > >> !"Missing font_properties entry is a fatal error!":Error:Assert
> > > >> failed:in file mftraining.cpp, line 287
> > > >> Segmentation fault
> >
> > > >> Also trying to create unicharset with
> >
> > > >> unicharset_extractor 01.box 02.box
> >
> > > >> works successfully, but mftraining -U ./unicharset 01.tr 02.trfails
> > > >> with the same error.
> >
> > > >> Somebody has an idea what I am doing wrong.
> >
> > > >> Also using the group e.g.
> > > >> with the search word "combine" did not result in any fitting
> > > >> solution.
> >
> > > >> Thanks for any advice,
> >
> > > >> Holm from Germany
> >
> > > >> --
> > > >> You received this message because you are subscribed to the Google
> >

Re: Building Tesseract with VC2008

2011-06-06 Thread zdenko podobny
Hello,

leptonlib* is leptonica library. You can download it (version 1.67) from
http://code.google.com/p/leptonica/downloads/list.

Or grab the latest svn version of tesseract (leptonica library is there).
There are solved also other issues...

Zdenko


On Tue, Jun 7, 2011 at 2:05 AM, David Amazing  wrote:

> I understand VS2008 is a supported build environment.
>
> I downloaded and extracted the source code and opened the
> \vs2008\tesseract.sln file and started to build it.
>
> However I ended up with six build errors (and 664 warnings):
>
> Error   635 fatal error LNK1104: cannot open file '..\vs2008\lib
> \leptonlibd.lib'wordlist2dawg   wordlist2dawg
> Error   636 fatal error LNK1104: cannot open file 'lib\leptonlibd.lib'
> tesseract   tesseract
> Error   640 fatal error LNK1104: cannot open file 'lib\leptonlibd.lib'
> tessdll tessdll
> Error   668 fatal error LNK1104: cannot open file '..\vs2008\lib
> \leptonlibd.lib'cntraining  cntraining
> Error   669 fatal error LNK1104: cannot open file '..\vs2008\lib
> \leptonlibd.lib'mftraining  mftraining
> Error   670 fatal error LNK1104: cannot open file '../bin.dbg/
> tessdll.lib'dlltest dlltest
>
> I couldn't find any downloadable leptonlibd.lib file or any detailed
> build instructions for VC.
>
> Can anyone help me with this? Thanks
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: When make , there are errors !!!!!!!!

2011-06-09 Thread zdenko podobny
I am sorry but I do not have have a crystal ball ;-)
Please provide necessary details (what version of OS you use, exact version
of tesseract, your compilation steps...)

Zdenko

On Wed, Jun 8, 2011 at 9:02 PM, ビ  wrote:

> make  all-recursive
> Making all in ccstruct
> /bin/sh ../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I. -
> I..  -I../ccutil -I../cutil -I../image -I../viewer -I/opt/local/
> include -I/usr/local/include/leptonica  -g -O2 -MT blobbox.lo -MD -MP -
> MF .deps/blobbox.Tpo -c -o blobbox.lo blobbox.cpp
> mv -f .deps/blobbox.Tpo .deps/blobbox.Plo
> mv: rename .deps/blobbox.Tpo to .deps/blobbox.Plo: No such file or
> directory
> make[3]: *** [blobbox.lo] Error 1
> make[2]: *** [all-recursive] Error 1
> make[1]: *** [all-recursive] Error 1
> make: *** [all] Error 2
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Tesseract doesn't work with a very simple example

2011-06-17 Thread zdenko podobny
First of all - please read documentation e.g. [1]. It can save your
time ;-).

[1]
http://code.google.com/p/tesseract-ocr/wiki/FAQ#Is_there_a_Minimum_Text_Size?_(It_won't_read_screen_text!)

Zdenko

On Fri, Jun 17, 2011 at 4:05 PM, Felipe Coutinho
wrote:

> Hello,
>
> I'm a new tess user. I'm trying to test the tess with this very simple text
> image and I didn't succeed. This was the text that was recognized: *
> vefysiqxe*. I used this command line: *tesseract
> C:\Users\felipelc\Desktop\simple.tif C:\Users\felipelc\Desktop\out*
> What am I doing wrong?
>
> Regards,
>
> Felipe.
>
> --
> Felipe Leal Coutinho
> http://www.felipelc.com/
> http://www.facebook.com/felipelcoutinho
> http://twitter.com/felipelcout
>
> Softaware Soluções em Informática
> http://www.softaware.emp.br
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Training procedure

2011-06-21 Thread zdenko podobny
If you got error on font_properties file, send also font_properties  ;-)

Zdenko

On Tue, Jun 21, 2011 at 2:45 PM, Esteban Bordón  wrote:

> For example using these files provides in
> http://tesseract-ocr.googlecode.com/files/boxtiff-2.01.spa.tar.gz and the
> command lines bellow
>
> *]$ tesseract spa.cour.g4.tif spa.cour.g4 nobatch box.train
> ]$ unicharset_extractor spa.cour.g4.box*
>
> These commands work ok but I don't know how I must continue
> If I run:
> *]$ mftraining -F font_properties -U unicharset spa.cour.g4.tr*
> I get
> *Reading spa.cour.g4.tr ...
> spa.cour.g4 has no defined properties.
>
> Error: Illegal short name for a feature!
>
> Fatal error: No error trap defined!
> Signal_termination_handler called with signal 2000*
>
> Now I'm trying in tesseract 3.00, then I can't use font_properties:
> *[ebordon@ebordon ]$ mftraining -U unicharset spa.cour.g4.tr Reading
> spa.cour.g4.tr ...
> spa.cour.g4 has no defined properties.
>
> Error: Illegal short name for a feature!
>
> Fatal error: No error trap defined!
> Signal_termination_handler called with signal 2000
> *
> and:
>
> *[ebordon@ebordon ]$ cntraining spa.cour.g4.tr
> Reading spa.cour.g4.tr ...
>
> Error: Illegal short name for a feature!
>
> Fatal error: No error trap defined!
> Signal_termination_handler called with signal 2000
> *
> Thanks,
> Esteban.
>
>
>
> 2011/6/20 Dmitri Silaev 
>
>> You have to show us your training images, resulted box files and all
>> used command lines.
>>
>> Warm regards,
>> Dmitri Silaev
>> www.CustomOCR.com
>>
>>
>>
>>
>>
>> On Mon, Jun 20, 2011 at 8:04 PM, Esteban Bordón 
>> wrote:
>> > Hi all!
>> >
>> > I'm working on a project that wants to digitize judicial expedients. We
>> want
>> > to use tesseract but we haven't had great results.
>> > I think that if I train tesseract very specifically for the kind of font
>> > that the expedients uses we could increase the positive results but I
>> > couldn't trained my character set.
>> > I have installed tesseract 3.01 in Ubuntu 11.04 and I followed the
>> > instructions posted on
>> > http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3.
>> > In the step
>> >
>> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Run_Tesseract_for_Training
>> > I've got many FATALITIES and I don't know how can I fix it.
>> >
>> > I tried with character set images used in spa training but I also had
>> > errors.
>> >
>> > Somebody can give me a simple example step by step to train tesseract
>> for
>> > specific charset?
>> >
>> > Thanks in advance,
>> > Esteban.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to tesseract-ocr@googlegroups.com
>> > To unsubscribe from this group, send email to
>> > tesseract-ocr+unsubscr...@googlegroups.com
>> > For more options, visit this group at
>> > http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Training procedure

2011-06-21 Thread zdenko podobny
what OS you use and which tesseract version?

Zdenko

PS: it worked on windows XP with tesseract 3.00

On Tue, Jun 21, 2011 at 3:17 PM, Esteban Bordón  wrote:

> Sorry, I forgot attach it. Anyway font_properties is used from v 3.01 and I
> am using v 3.00
>
> cheers,
> Esteban.
>
> 2011/6/21 zdenko podobny 
>
>> If you got error on font_properties file, send also font_properties  ;-)
>>
>> Zdenko
>>
>> On Tue, Jun 21, 2011 at 2:45 PM, Esteban Bordón wrote:
>>
>>> For example using these files provides in
>>> http://tesseract-ocr.googlecode.com/files/boxtiff-2.01.spa.tar.gz and
>>> the command lines bellow
>>>
>>> *]$ tesseract spa.cour.g4.tif spa.cour.g4 nobatch box.train
>>> ]$ unicharset_extractor spa.cour.g4.box*
>>>
>>> These commands work ok but I don't know how I must continue
>>> If I run:
>>> *]$ mftraining -F font_properties -U unicharset spa.cour.g4.tr*
>>> I get
>>> *Reading spa.cour.g4.tr ...
>>> spa.cour.g4 has no defined properties.
>>>
>>> Error: Illegal short name for a feature!
>>>
>>> Fatal error: No error trap defined!
>>> Signal_termination_handler called with signal 2000*
>>>
>>> Now I'm trying in tesseract 3.00, then I can't use font_properties:
>>> *[ebordon@ebordon ]$ mftraining -U unicharset spa.cour.g4.tr Reading
>>> spa.cour.g4.tr ...
>>> spa.cour.g4 has no defined properties.
>>>
>>> Error: Illegal short name for a feature!
>>>
>>> Fatal error: No error trap defined!
>>> Signal_termination_handler called with signal 2000
>>> *
>>> and:
>>>
>>> *[ebordon@ebordon ]$ cntraining spa.cour.g4.tr
>>> Reading spa.cour.g4.tr ...
>>>
>>> Error: Illegal short name for a feature!
>>>
>>> Fatal error: No error trap defined!
>>> Signal_termination_handler called with signal 2000
>>> *
>>> Thanks,
>>> Esteban.
>>>
>>>
>>>
>>> 2011/6/20 Dmitri Silaev 
>>>
>>>> You have to show us your training images, resulted box files and all
>>>> used command lines.
>>>>
>>>> Warm regards,
>>>> Dmitri Silaev
>>>> www.CustomOCR.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 20, 2011 at 8:04 PM, Esteban Bordón 
>>>> wrote:
>>>> > Hi all!
>>>> >
>>>> > I'm working on a project that wants to digitize judicial expedients.
>>>> We want
>>>> > to use tesseract but we haven't had great results.
>>>> > I think that if I train tesseract very specifically for the kind of
>>>> font
>>>> > that the expedients uses we could increase the positive results but I
>>>> > couldn't trained my character set.
>>>> > I have installed tesseract 3.01 in Ubuntu 11.04 and I followed the
>>>> > instructions posted on
>>>> > http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3.
>>>> > In the step
>>>> >
>>>> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Run_Tesseract_for_Training
>>>> > I've got many FATALITIES and I don't know how can I fix it.
>>>> >
>>>> > I tried with character set images used in spa training but I also had
>>>> > errors.
>>>> >
>>>> > Somebody can give me a simple example step by step to train tesseract
>>>> for
>>>> > specific charset?
>>>> >
>>>> > Thanks in advance,
>>>> > Esteban.
>>>> >
>>>> > --
>>>> > You received this message because you are subscribed to the Google
>>>> > Groups "tesseract-ocr" group.
>>>> > To post to this group, send email to tesseract-ocr@googlegroups.com
>>>> > To unsubscribe from this group, send email to
>>>> > tesseract-ocr+unsubscr...@googlegroups.com
>>>> > For more options, visit this group at
>>>> > http://groups.google.com/group/tesseract-ocr?hl=en
>>>> >
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> tesseract-ocr+unsubscr...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Creating DLL for tessract3

2011-06-22 Thread zdenko podobny
Please read ReadMe [1]

Unfortunately tessdll was not removed on time from source so it became part
of source code released as version 3.00. But it is not working. Have a look
on and search tesseract-dev forum [2] there for 'tessdll' and maybe for
'wrapper' have a better overview).

Look on AddOns wiki [3] - maybe something will work for you...

Zdenko

[1] http://code.google.com/p/tesseract-ocr/wiki/ReadMe#Windows
[2] http://groups.google.com/group/tesseract-dev
[3] http://code.google.com/p/tesseract-ocr/wiki/AddOns#Tesseract_3.0x

On Thu, Jun 23, 2011 at 5:04 AM, Saurabh Gandhi wrote:

> Within the Tesserract solution there is a project named tessdll. By
> compiling that, I have been able to create a dll out of it successfully but
> somehow the dll fails to run without VS2008 installation. Anyone faced this
> issue?
>
> --
> Regards,
> Saurabh Gandhi
>
>
>
>
>
> 2011/6/23 Hoàng Văn Tú 
>
>> Me too. :D
>>
>> 2011/6/22 sisi :
>> > hello
>> >
>> > I want to know how can I create a DLL file for tesseract to use it in
>> > GUI application
>> >
>> >
>> > Thank you
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to tesseract-ocr@googlegroups.com
>> > To unsubscribe from this group, send email to
>> > tesseract-ocr+unsubscr...@googlegroups.com
>> > For more options, visit this group at
>> > http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Creating DLL for tessract3

2011-07-09 Thread zdenko podobny
On Sat, Jul 9, 2011 at 2:21 PM, Sarel van der Merwe wrote:

> look at this thread...
> https://mail.google.com/mail/?shva=1#inbox/130fd043420179a8
>
> why so send link to (your) gmail inbox?




>
> On Sat, Jul 9, 2011 at 1:38 PM, Alexander Lubyagin
>  wrote:
> > On Jun 22, 4:09 pm, sisi  wrote:
> >> I want to know how can I create aDLLfile for tesseract to use it in GUI
> application
> >
> > Please, build vs2010\tesseract.sln (SVN) file by Microsoft Visual
> > Studio 2010.
> > In "Release build" mode.
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to tesseract-ocr@googlegroups.com
> > To unsubscribe from this group, send email to
> > tesseract-ocr+unsubscr...@googlegroups.com
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Compiling Tesseract SVN under windows

2011-07-11 Thread zdenko podobny
Hi,

at the moment only Visual C++ Express 2008 (it is for free) is supported on
Windows (x86/32). In svn there is also support for VC2010...

Zdenko


On Mon, Jul 11, 2011 at 10:45 AM,  wrote:

> Hi all,
>
> It seems nobody knows how to compile tesseract using cygwin.
>
> Now I want to ask what is the easiest way to build tesseract under windows?
>
> Thanks for help.
>
> Greetings,
> Simon
>
> --
> Simon Eigeldinger
> simon.eigeldin...@vol.at
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscribe@**googlegroups.com
> For more options, visit this group at
> http://groups.google.com/**group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: read_params_file

2011-07-26 Thread zdenko podobny
On Mon, Jul 25, 2011 at 10:50 PM, Donald Hume  wrote:

> I have used the tesseract.exe from the SVN for Tesseract 3.0. You can
> see in this screenshot: http://j.drhu.me/TesseractBox.png
>
> Image says you use tesseract 3.01 (not 3.0) - e.g. you had to
compile Tesseract by yousefl.
As Dmitri pointed out: Error message say you use wrong (not 3.01) config
file. see [1] - parameter "*tessedit_use_nn*" was removed from config file
'box.train' in revision 526.

So update your box.train file to recent version.

Zdenko

[1]
http://code.google.com/p/tesseract-ocr/source/diff?path=/trunk/tessdata/configs/box.train&format=side&r=526

Previously I tried using jTessBox Editor and excel to correct the
> boxes and then save the box file using jTessBox Editor, which can
> handle 3.x box files. Is there a way I can just manually make the
> changes that need to be done to my box file? create.box does not make
> an accurate file and I must correct it by hand.
>
> Is there a way to fix this?
>
> Thanks,
>
> Don
>
> On Jul 25, 1:05 am, Dmitri Silaev  wrote:
> > Apparently you are trying to use a 2.xx version "box.train" config
> > file with version 3.xx. Use "box.train" that comes with version 3.xx
> > instead.
> >
> > Warm regards,
> > Dmitri Silaevwww.CustomOCR.com
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Jul 25, 2011 at 7:10 AM, Donald Hume 
> wrote:
> > > Hello!
> >
> > >   I have created a .tif and matching box file and made all of the
> > > necessary corrections to the box file. I am now trying to "train
> > > tesseract" but I keep receiving the following error:
> >
> > > tesseract eng.atlant.exp0.tif eng.altant.exp0 box.train
> >
> > > read_params_file: parameter not found: tessedit_use_nn
> >
> > >I can't find mention of this in the wiki. Might anyone know what
> > > I'm doing incorrectly? I would greatly appreciate any help.
> >
> > > Thanks,
> > > Don
> >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to tesseract-ocr@googlegroups.com
> > > To unsubscribe from this group, send email to
> > > tesseract-ocr+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Memory management in Tesseract

2011-07-27 Thread zdenko podobny
see:
http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h#245

Zdenko

On Wed, Jul 27, 2011 at 6:01 AM, Sandeep Parmar  wrote:

> Hello everyone,
>
> I am using the following code snippet, within this I would like to know
> whether 'GetUTF8Text' will destroy my source image 'arr_image' or not
> after performing recognition.
>
> *api.SetImage((const unsigned char*)arr_image[loop_index3],*
> * **width[loop_index3],*
> * **height[loop_index3],1,*
> * **widthstep[loop_index3]);*
> * *
> * char* text = api.GetUTF8Text();** *
>
> Any help highly appreciated.
>
> Thanks & Regards
> Sandeep
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Re: query on French Script MT tif images

2011-07-27 Thread zdenko podobny
If you are really interesting in help, than provide example image ;-)

Zdenko

On Wed, Jul 27, 2011 at 11:45 AM,  wrote:

> Hi,
>
> When i run the command tesseract fsmt.tif output
> it shows me some junk data "ȉY`I'I/2," for image with having "Mentally"
> as the text in this font.
>
> Any idea please help.
>
>
>
> On Jul 27, 2011 11:02am, sreekanth reddy  wrote:
> > Hi I am also working to train french Script Mt,if any positive results ,i
> share it with you.
> >
> >
> > --sreekanth
> >
> > On Wed, Jul 27, 2011 at 10:35 AM, syed arifullah badsha s
> syedarifbadsh...@gmail.com> wrote:
> >
> >
> > the box files are not getting created properly. I am trying to train it,
> but in vain, but will try again. If u have any boxfiles are trained data,
> kindly share with me.
> >
> >
> >
> >
> > On Tue, Jul 26, 2011 at 6:51 PM, Sven Pedersen sven.peder...@gmail.com>
> wrote:
> >
> >
> >
> > Hi Syed,How are you trying to OCR the image? What kind of failure message
> are you getting? Is it a problem with the font, or with the image format?
> >
> >
> >
> > --Sven
> >
> >
> >
> > On Tue, Jul 26, 2011 at 2:20 AM, syedarifbadsh...@gmail.com
> syedarifbadsh...@gmail.com> wrote:
> >
> >
> >
> >
> > Hi All,
> >
> >
> >
> > Kindly help me in recognizing the french script MT font that is in a
> >
> > TIF image.
> >
> > Did any one tried it.
> >
> >
> >
> >
> >
> > I have a sample tif file but i dont have provision  to attach it
> >
> > here
> >
> >
> >
> > Any info will help.
> >
> >
> >
> > --
> >
> > You received this message because you are subscribed to the Google
> >
> > Groups "tesseract-ocr" group.
> >
> > To post to this group, send email to tesseract-ocr@googlegroups.com
> >
> > To unsubscribe from this group, send email to
> >
> > tesseract-ocr+unsubscr...@googlegroups.com
> >
> > For more options, visit this group at
> >
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
> >
> >
> >
> >
> >
> > --
> > ``All that is gold does not glitter,
> >   not all those who wander are lost;
> > the old that is strong does not wither,
> >   deep roots are not reached by the frost.
> >
> >
> >
> >
> > From the ashes a fire shall be woken,
> >   a light from the shadows shall spring;
> > renewed shall be blade that was broken,
> >   the crownless again shall be king.”
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > You received this message because you are subscribed to the Google
> >
> > Groups "tesseract-ocr" group.
> >
> > To post to this group, send email to tesseract-ocr@googlegroups.com
> >
> > To unsubscribe from this group, send email to
> >
> > tesseract-ocr+unsubscr...@googlegroups.com
> >
> > For more options, visit this group at
> >
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> >
> > You received this message because you are subscribed to the Google
> >
> > Groups "tesseract-ocr" group.
> >
> > To post to this group, send email to tesseract-ocr@googlegroups.com
> >
> > To unsubscribe from this group, send email to
> >
> > tesseract-ocr+unsubscr...@googlegroups.com
> >
> > For more options, visit this group at
> >
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > You received this message because you are subscribed to the Google
> >
> > Groups "tesseract-ocr" group.
> >
> > To post to this group, send email to tesseract-ocr@googlegroups.com
> >
> > To unsubscribe from this group, send email to
> >
> > tesseract-ocr+unsubscr...@googlegroups.com
> >
> > For more options, visit this group at
> >
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
> >
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Problem with training Tesseract 3.01 (svn r596)

2011-07-28 Thread zdenko podobny
As always - can you please send example image + box file?

Zdenko

On Thu, Jul 28, 2011 at 9:26 AM, Sandeep Parmar  wrote:

> Hi,
> I am using English language fonts like 'Comic sans MS', 'Times','Arial'
> etc.
>
>
> On Thu, Jul 28, 2011 at 12:50 PM, Sriranga(78yrsold) <
> withblessi...@gmail.com> wrote:
>
>> Sandee
>> which lang  you are using for training purpose since you are using
>> cowboxer?
>>
>>
>> On Thu, Jul 28, 2011 at 12:26 PM, Sandeep Parmar <
>> sandeep.theart...@gmail.com> wrote:
>>
>>> Hello Everyone,
>>>
>>> I downloaded the latest tesseract 3.01 from the svn and was trying to
>>> train the tesseract for new fonts.
>>>
>>> I created the box files by following the command
>>> "tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[num] -l
>>> yournewlanguage batch.nochop makebox "
>>> given on training page of tesseract wiki.
>>>
>>> But when I saw the box file in Cowboxer, It was showing wrong value for
>>> almost all the characters of the image.
>>>
>>> I am not able to figure out what could be the reason for this as this is
>>> not the first time that I am training tesseract,
>>> I have succesfully trained Tesseract3.00 for new fonts already. But on
>>> Training Tesseract 3.01 I got the above problem in Box files genereated.
>>>
>>> Please Help.
>>>
>>> Thanks and Regards
>>> Sandeep
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> tesseract-ocr+unsubscr...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to tesseract-ocr@googlegroups.com
>> To unsubscribe from this group, send email to
>> tesseract-ocr+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: [r596] "Error opening data file ./tessdata/xxx.traineddata"

2011-07-28 Thread zdenko podobny
On Thu, Jul 28, 2011 at 11:17 AM, 73r0  wrote:

> Hi,
>
> I downloaded the last revision of Tesseract (r596) form SVN. Then I
> build with vs2010 whitout any issues to get the tesseract.exe file. I
> added the path of tesseract.exe in the PATH environnement variable so
> i can call tesseract in a shell without being in his directory. I also
> added the tessdata folder where tesseract.exe is located. tessdata
> contains the fra.traineddata file I got from SVN.
> I use this command in a shell : "tesseract myPicture.tiff output.txt -
> l fra"
> But I get this error : "Error opening data file ./tessdata/
> fra.traineddata".
>
> it is clear: tesseract.exe expect tessdata folder in current directory (./)
because you did not setup environment variable TESSDATA_PREFIX.


> I test with the eng.traineddata too but same error. I noticed the is
> the same error when the file is not found, but in this case the files
> are in the right place.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Problem with training Tesseract 3.01 (svn r596)

2011-07-28 Thread zdenko podobny
On Thu, Jul 28, 2011 at 12:16 PM, Sandeep Parmar <
sandeep.theart...@gmail.com> wrote:

> Hi Zdenko,
>
> these results are very much similar to the one which i got using older
> version(tesseract 3.0) but with 3.01 it was coming worst.
>
> Can you share your 3.01 exe files for training? I am using Windows XP 32
> bit. I will cross check it with the same input image and
> see whether i am able to get the similar results as yours.
>
> First of all: 3.01 training is not documented so I suggest to use it only
for testing purposes (testers/hacker are welcomed ;-) )
My build can be sound here [1] (build with VS 2008 on Windows XP SP3). I
just compress with upx to get smalled exe

Zdenko
[1] https://github.com/zdenop/qt-box-editor/downloads

Thanks
> Sandeep
>
>
> On Thu, Jul 28, 2011 at 3:36 PM, zdenko podobny  wrote:
>
>> I run (svn r596) on Windows XP:
>> "tesseract eng.arial.tif eng.arial.301 batch.nochop makebox"
>> and I got totally different result (see attachment). I also tried 3.00 and
>> it gave me similar result as r596 (yes there are differences).
>> What OS you use?
>>
>> Zdenko
>>
>> On Thu, Jul 28, 2011 at 10:11 AM, Sandeep Parmar <
>> sandeep.theart...@gmail.com> wrote:
>>
>>> hi zdenko/sriranga,
>>>
>>> please find the zipped folder attached here with.
>>>
>>> sandeep
>>>
>>>
>>> On Thu, Jul 28, 2011 at 1:19 PM, Sriranga(78yrsold) <
>>> withblessi...@gmail.com> wrote:
>>>
>>>> @Sandeep,
>>>> As  suggested by Zdenko Podobny, please forward sample images with its
>>>> box files?
>>>>
>>>>
>>>> On Thu, Jul 28, 2011 at 1:08 PM, zdenko podobny wrote:
>>>>
>>>>> As always - can you please send example image + box file?
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> On Thu, Jul 28, 2011 at 9:26 AM, Sandeep Parmar <
>>>>> sandeep.theart...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I am using English language fonts like 'Comic sans MS',
>>>>>> 'Times','Arial' etc.
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 28, 2011 at 12:50 PM, Sriranga(78yrsold) <
>>>>>> withblessi...@gmail.com> wrote:
>>>>>>
>>>>>>> Sandee
>>>>>>> which lang  you are using for training purpose since you are using
>>>>>>> cowboxer?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 28, 2011 at 12:26 PM, Sandeep Parmar <
>>>>>>> sandeep.theart...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello Everyone,
>>>>>>>>
>>>>>>>> I downloaded the latest tesseract 3.01 from the svn and was trying
>>>>>>>> to train the tesseract for new fonts.
>>>>>>>>
>>>>>>>> I created the box files by following the command
>>>>>>>> "tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[num
>>>>>>>> ] -l yournewlanguage batch.nochop makebox "
>>>>>>>> given on training page of tesseract wiki.
>>>>>>>>
>>>>>>>> But when I saw the box file in Cowboxer, It was showing wrong value
>>>>>>>> for almost all the characters of the image.
>>>>>>>>
>>>>>>>> I am not able to figure out what could be the reason for this as
>>>>>>>> this is not the first time that I am training tesseract,
>>>>>>>> I have succesfully trained Tesseract3.00 for new fonts already. But
>>>>>>>> on Training Tesseract 3.01 I got the above problem in Box files 
>>>>>>>> genereated.
>>>>>>>>
>>>>>>>> Please Help.
>>>>>>>>
>>>>>>>> Thanks and Regards
>>>>>>>> Sandeep
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com
>>>>>>>> To unsubscribe from this grou

Re: Problem with training Tesseract 3.01 (svn r596)

2011-07-28 Thread zdenko podobny
On Thu, Jul 28, 2011 at 2:19 PM, Sriranga(78yrsold)  wrote:

> Hi zdenko,
> Just now i dowloaded tesseract.exe(r596)  However I find no other exe files
> like unicharset.exe, mftraining.exe, cntraining.exe are not available for
> download.  Where from i can download the missing exe files?
> With warmest Regards,
> -sriranga(78yrs)
>
> you can build it by yourself. this is not official release. it was publish
just to test one step of traning.


>
> On Thu, Jul 28, 2011 at 5:32 PM, zdenko podobny  wrote:
>
>>
>>
>> On Thu, Jul 28, 2011 at 12:16 PM, Sandeep Parmar <
>> sandeep.theart...@gmail.com> wrote:
>>
>>> Hi Zdenko,
>>>
>>> these results are very much similar to the one which i got using older
>>> version(tesseract 3.0) but with 3.01 it was coming worst.
>>>
>>> Can you share your 3.01 exe files for training? I am using Windows XP 32
>>> bit. I will cross check it with the same input image and
>>> see whether i am able to get the similar results as yours.
>>>
>>> First of all: 3.01 training is not documented so I suggest to use it only
>> for testing purposes (testers/hacker are welcomed ;-) )
>> My build can be sound here [1] (build with VS 2008 on Windows XP SP3). I
>> just compress with upx to get smalled exe
>>
>> Zdenko
>> [1] https://github.com/zdenop/qt-box-editor/downloads
>>
>> Thanks
>>> Sandeep
>>>
>>>
>>> On Thu, Jul 28, 2011 at 3:36 PM, zdenko podobny wrote:
>>>
>>>> I run (svn r596) on Windows XP:
>>>> "tesseract eng.arial.tif eng.arial.301 batch.nochop makebox"
>>>> and I got totally different result (see attachment). I also tried 3.00
>>>> and it gave me similar result as r596 (yes there are differences).
>>>> What OS you use?
>>>>
>>>> Zdenko
>>>>
>>>> On Thu, Jul 28, 2011 at 10:11 AM, Sandeep Parmar <
>>>> sandeep.theart...@gmail.com> wrote:
>>>>
>>>>> hi zdenko/sriranga,
>>>>>
>>>>> please find the zipped folder attached here with.
>>>>>
>>>>> sandeep
>>>>>
>>>>>
>>>>> On Thu, Jul 28, 2011 at 1:19 PM, Sriranga(78yrsold) <
>>>>> withblessi...@gmail.com> wrote:
>>>>>
>>>>>> @Sandeep,
>>>>>> As  suggested by Zdenko Podobny, please forward sample images with its
>>>>>> box files?
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 28, 2011 at 1:08 PM, zdenko podobny wrote:
>>>>>>
>>>>>>> As always - can you please send example image + box file?
>>>>>>>
>>>>>>> Zdenko
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 28, 2011 at 9:26 AM, Sandeep Parmar <
>>>>>>> sandeep.theart...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I am using English language fonts like 'Comic sans MS',
>>>>>>>> 'Times','Arial' etc.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 28, 2011 at 12:50 PM, Sriranga(78yrsold) <
>>>>>>>> withblessi...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sandee
>>>>>>>>> which lang  you are using for training purpose since you are using
>>>>>>>>> cowboxer?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 28, 2011 at 12:26 PM, Sandeep Parmar <
>>>>>>>>> sandeep.theart...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Everyone,
>>>>>>>>>>
>>>>>>>>>> I downloaded the latest tesseract 3.01 from the svn and was trying
>>>>>>>>>> to train the tesseract for new fonts.
>>>>>>>>>>
>>>>>>>>>> I created the box files by following the command
>>>>>>>>>> "tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[
>>>>>>>>>> num] -l yournewlanguage batch.nochop makebox "
>>>>>>>>>> given on training page of tesseract wiki.
>>>>>>>>>>
>>>>

Re: "Error opening data file ./tessdata/xxx.traineddata"

2011-07-28 Thread zdenko podobny
AFAIR this error means that version of tessdata file(s) (xxx.traineddata) do
not match version of tesseract.
Check if you have 3.01 data files (traineddata) in tessdata folder.

Zdenko

On Thu, Jul 28, 2011 at 2:09 PM, 73r0  wrote:

> Thanks a lot for the answer that was the problem. It was obviously a
> stupid question, I didn't read the ReadMe file accuratly enough.
> Still I want to precise that the Tesseract I compiled (r598) with
> vs2010 throws an error:
> "actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert
> failed:in file
>  ..\ccutil\tessdatamanager.cpp, line 55"
>
> Whereas the Tesseract I got from the Download page (r517) works
> perfectly.
>
> Thanks for your awesome work and support.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: Error in Box Train for Tesseract3.01(svn r596)

2011-07-30 Thread zdenko podobny
On Sat, Jul 30, 2011 at 6:43 AM, Sandeep Parmar  wrote:

> Deal all,
>
> I am getting following error while training the Box files
>
> First of all: write always exact command you use!


> "read_params_file: parameter not found: tessedit_use_nn"
>
> This means that you have in your config file parameter tessedit_use_nn,
that is not used anymore.
You code (configs) is not actual - I already mentioned it list [1] or
your TESSDATA_PREFIX point to wrong directory (directory with old configs).

Zdenko

[1] http://groups.google.com/group/tesseract-ocr/msg/3c246de9c4724359?hl=en

What could be the reason for this error? I tried searching the answer on
> forum but it didnt worked.
>
> Please help.
>
> Thanks and regards
> Sandeep
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


  1   2   3   4   5   6   7   8   9   10   >