Dn(a 05.06.2010 14:57, Jimmy O'Regan  wrote / napísal(a):
> On Saturday, June 5, 2010, zdpo <zde...@gmail.com> wrote:
>   
>> Dear Sriranga,
>>
>> your box file is wrong (for tesseract 3.0 and >r319). It did not match
>> to description in "Make Box Files" on 
>> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract.
>>
>> BTW: I am aware of any tool that support this new box format (for
>> multipage tif).
>>
>>     
> it shouldn't matter. The code is supposed to accept the old style too,
> provided that the number of pages is set to zero, which is determined
> by the image reading code, which doesn't work on windows.
>
> If it fails on Linux, then I'd consider it a bug.
>
>   

    /usr/local/bin/tesseract slk.arial.001.tif slk.arial.001 makebox 
batch.nochop


created slk.arial.001.box file with 6 columns (last one with 0).
When I run:

    /usr/local/bin/unicharset_extractor slk.arial.001.box

output is OK. When I convert it to 2.x format ('awk '{print $1" "$2"
"$3" "$4" "$5}' <slk.arial.001.box >slk.arial.002.box') and run:

    /usr/local/bin/unicharset_extractor slk.arial.002.box

I got errors:

    Extracting unicharset from slk.arial.002.box
    Box file format error on line 1 ignored
    ...

Anyway if  tesseract 3.0 of Sriranga produced old format that something
is wrong in (his/windows) installation/compilation process. Or maybe he
just simply mixed outputs from tesseract 2.x with 3.0...

Zd.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to